If you're running 1.0.0 and not 1.0.1 or later, you're probably running into this bug: https://issues.basho.com/show_bug.cgi?id=1242
In short, in Riak 1.0.0, there was a bug in the release the prevented handoff from occurring on nodes if riak_search was enabled. The most straightforward solution is to perform an in-place upgrade of all your 1.0.0 nodes. Stop one node, install 1.0.2 or 1.0.3, restart the node. Repeat for the other nodes. Another option would be to disable riak_search, restart each node one by one, and then issue riak_core_ring_manager:force_update() on the claimant node. Not sure which is less disruptive for your particular case. Unfortunately, this particular bug is basically impossible to address without switching to newer code. -Joe 2012/1/18 Aphyr <ap...@aphyr.com>: > Hmm. I can tell you that *typically* we see riak-admin transfers show many > partitions awaiting transfer. If you run the transfers command it resets the > timer for transfers to complete, so don't do it too often. The total number > of partitions awaiting transfer should slowly decrease. > > When zero partitions are waiting to hand off, then you may see riak-admin > ring_status waiting to finish ownership changes. Sometimes it gets stuck on > [riak_kv_vnode], in which case force-handoffs seems to do the trick. Then it > can *also* get stuck on [], and then the long snippet I linked to does the > trick. > > So: give it 15 minutes, and check to see if fewer partitions are awaiting > transfer. If you're eager, you can watch the logs for handoff messages or > iptraf that sucker to see the handoff network traffic directly; it runs on a > distinct port IIRC so it's easy to track. > > --Kyle > > > On 01/18/2012 02:40 PM, Fredrik Lindström wrote: >> >> I just ran the two commands on all 4 nodes. >> >> When run on one of the original nodes the first >> command(riak_core_ring_manager:force_update()) resultsin output like the >> >> following in the console of the new node >> <snip> >> 23:20:06.928 [info] loading merge_index >> './data/merge_index/331121464707782692405522344912282871640797216768' >> 23:20:06.929 [info] opened buffer >> >> './data/merge_index/331121464707782692405522344912282871640797216768/buffer.1' >> 23:20:06.929 [info] finished loading merge_index >> './data/merge_index/331121464707782692405522344912282871640797216768' >> with rollover size 912261.12 >> 23:20:07.006 [info] loading merge_index >> './data/merge_index/730750818665451459101842416358141509827966271488' >> 23:20:07.036 [info] opened buffer >> >> './data/merge_index/730750818665451459101842416358141509827966271488/buffer.1' >> 23:20:07.036 [info] finished loading merge_index >> './data/merge_index/730750818665451459101842416358141509827966271488' >> with rollover size 1132462.08 >> 23:20:47.050 [info] loading merge_index >> './data/merge_index/513809169374145557180982949001818249097788784640' >> 23:20:47.054 [info] opened buffer >> >> './data/merge_index/513809169374145557180982949001818249097788784640/buffer.1' >> 23:20:47.055 [info] finished loading merge_index >> './data/merge_index/513809169374145557180982949001818249097788784640' >> with rollover size 975175.6799999999 >> </snip> >> >> riak_core_vnode_manager:force_handoffs() does not produce any output on >> any console on any node besides "OK". No tasty handover log messages to >> be found. >> >> Furthermore I'm not sure what to make of the output from riak-admin >> transfers: >> 't...@qbkpxadmin01.ad.qnet.local' waiting to handoff 62 partitions >> 'qbkp...@qbkpx03.ad.qnet.local' waiting to handoff 42 partitions >> 'qbkp...@qbkpx01.ad.qnet.local' waiting to handoff 42 partitions >> >> Our second node (qbkpx02) is missing from that list. The output also >> states that the new node (test) wants to handoff 62 partitions although >> it is the owner of 0 partitions. >> >> riak-admin ring_status lists various pending ownership handoffs, all of >> them are between our 3 original nodes. The new node is not mentioned >> anywhere. >> >> I'm really curious about the current state of our cluster. It does look >> rather exciting :) >> >> /F >> ------------------------------------------------------------------------ >> *From:* Aphyr [ap...@aphyr.com] >> *Sent:* Wednesday, January 18, 2012 11:15 PM >> *To:* Fredrik Lindström >> *Cc:* riak-users@lists.basho.com >> *Subject:* Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster >> >> >> Did you try riak_core_ring_manager:force_update() and force_handoffs() >> on the old partition owner as well as the new one? Can't recall off the >> top of my head which one needs to execute that handoff. >> >> --Kyle >> >> On Jan 18, 2012, at 2:08 PM, Fredrik Lindström wrote: >> >>> Thanks for the response Aphyr. >>> >>> I'm seeing Waiting on: >>> [riak_search_vnode,riak_kv_vnode,riak_pipe_vnode] instead of [] so I'm >>> thinking it's a different scenario. >>> It might be worth mentioning that the data directory on the new node >>> does contain relevant subdirectories but the disk footprint is so >>> small I doubt any data has been transferred. >>> >>> /F >>> ------------------------------------------------------------------------ >>> *From:*Aphyr [ap...@aphyr.com] >>> *Sent:*Wednesday, January 18, 2012 10:46 PM >>> *To:*Fredrik Lindström >>> *Cc:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com> >>> *Subject:*Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster >>> >>> https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org >>> <https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTESorg> >>> >>> >>> If partition transfer is blocked awaiting [] (as opposed to [kv_vnode] >>> or whatever), There's a snippet in there that might be helpful. >>> >>> --Kyle >>> >>> On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote: >>> >>>> After some digging I found a suggestion from Joseph Blomstedt in an >>>> earlier mail thread >>>> >>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-January/007116.html >>>> >>>> in the riak console: >>>> riak_core_ring_manager:force_update(). >>>> riak_core_vnode_manager:force_handoffs(). >>>> >>>> The symptoms would appear to be the same although the cluster >>>> referenced in the mail thread does not appear to have search enabled, >>>> as far as I can tell from the log snippets. The mail thread doesn't >>>> really specify which node to run the commands on so I tried both the >>>> new node and the current claimant of the cluster. >>>> >>>> Sadly the suggested steps did not produce any kind of ownership handoff. >>>> >>>> Any helpful ideas would be much appreciated :) >>>> >>>> /F >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> *From:*riak-users-boun...@lists.basho.com >>>> >>>> <mailto:riak-users-boun...@lists.basho.com>[riak-users-boun...@lists.basho.com] >>>> >>>> on behalf of Fredrik Lindström [fredrik.lindst...@qbranch.se] >>>> *Sent:*Wednesday, January 18, 2012 4:00 PM >>>> *To:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com> >>>> *Subject:*Pending transfers when joining 1.0.3 node to 1.0.0 cluster >>>> >>>> >>>> Hi everyone, >>>> when we try to join a 1.0.3 node to an existing 1.0.0 (3 node) >>>> cluster the ownership transfer doesn't appear to take place. I'm >>>> guessing that we're making some stupid little mistake but we can't >>>> figure it out at the moment. Anyone run into something similar? >>>> >>>> Riak Search is enabled on the original nodes in the cluster as well >>>> as the new node. >>>> Ring size is set to 128 >>>> >>>> The various logfiles do not appear to contain any errors or warnings >>>> >>>> Output from riak-admin member_status >>>> ================================= Membership >>>> ================================== >>>> Status Ring Pending Node >>>> >>>> ------------------------------------------------------------------------------- >>>> valid 33.6% 25.0% 'qbkp...@qbkpx01.ad.qnet.local >>>> <mailto:'qbkp...@qbkpx01.ad.qnet.local>' >>>> valid 33.6% 25.0% 'qbkp...@qbkpx02.ad.qnet.local >>>> <mailto:'qbkp...@qbkpx02.ad.qnet.local>' >>>> valid 32.8% 25.0% 'qbkp...@qbkpx03.ad.qnet.local >>>> <mailto:'qbkp...@qbkpx03.ad.qnet.local>' >>>> valid 0.0% 25.0% 't...@qbkpxadmin01.ad.qnet.local >>>> <mailto:'t...@qbkpxadmin01.ad.qnet.local>' >>>> >>>> >>>> ------------------------------------------------------------------------------- >>>> >>>> Output from riak-admin ring_status >>>> See attached file >>>> >>>> Output from riak-admin transfers >>>> 't...@qbkpxadmin01.ad.qnet.local >>>> <mailto:'t...@qbkpxadmin01.ad.qnet.local>' waiting to handoff 10 >>>> partitions >>>> 'qbkp...@qbkpx03.ad.qnet.local >>>> <mailto:'qbkp...@qbkpx03.ad.qnet.local>' waiting to handoff 62 >>>> partitions >>>> 'qbkp...@qbkpx01.ad.qnet.local >>>> <mailto:'qbkp...@qbkpx01.ad.qnet.local>' waiting to handoff 63 >>>> partitions >>>> >>>> >>>> >>>> /F >>>> >>>> >>>> _______________________________________________ >>>> riak-users mailing list >>>> riak-users@lists.basho.com <mailto:riak-users@lists.basho.com> >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >>> >> > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Joseph Blomstedt <j...@basho.com> Software Engineer Basho Technologies, Inc. http://www.basho.com/ _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com