If you're running 1.0.0 and not 1.0.1 or later, you're probably
running into this bug:

In short, in Riak 1.0.0, there was a bug in the release the prevented
handoff from occurring on nodes if riak_search was enabled. The most
straightforward solution is to perform an in-place upgrade of all your
1.0.0 nodes. Stop one node, install 1.0.2 or 1.0.3, restart the node.
Repeat for the other nodes.

Another option would be to disable riak_search, restart each node one
by one, and then issue riak_core_ring_manager:force_update() on the
claimant node. Not sure which is less disruptive for your particular

Unfortunately, this particular bug is basically impossible to address
without switching to newer code.


2012/1/18 Aphyr <ap...@aphyr.com>:
> Hmm. I can tell you that *typically* we see riak-admin transfers show many
> partitions awaiting transfer. If you run the transfers command it resets the
> timer for transfers to complete, so don't do it too often. The total number
> of partitions awaiting transfer should slowly decrease.
> When zero partitions are waiting to hand off, then you may see riak-admin
> ring_status waiting to finish ownership changes. Sometimes it gets stuck on
> [riak_kv_vnode], in which case force-handoffs seems to do the trick. Then it
> can *also* get stuck on [], and then the long snippet I linked to does the
> trick.
> So: give it 15 minutes, and check to see if fewer partitions are awaiting
> transfer. If you're eager, you can watch the logs for handoff messages or
> iptraf that sucker to see the handoff network traffic directly; it runs on a
> distinct port IIRC so it's easy to track.
> --Kyle
> On 01/18/2012 02:40 PM, Fredrik Lindström wrote:
>> I just ran the two commands on all 4 nodes.
>> When run on one of the original nodes the first
>> command(riak_core_ring_manager:force_update()) resultsin output like the
>> following in the console of the new node
>> <snip>
>> 23:20:06.928 [info] loading merge_index
>> './data/merge_index/331121464707782692405522344912282871640797216768'
>> 23:20:06.929 [info] opened buffer
>> './data/merge_index/331121464707782692405522344912282871640797216768/buffer.1'
>> 23:20:06.929 [info] finished loading merge_index
>> './data/merge_index/331121464707782692405522344912282871640797216768'
>> with rollover size 912261.12
>> 23:20:07.006 [info] loading merge_index
>> './data/merge_index/730750818665451459101842416358141509827966271488'
>> 23:20:07.036 [info] opened buffer
>> './data/merge_index/730750818665451459101842416358141509827966271488/buffer.1'
>> 23:20:07.036 [info] finished loading merge_index
>> './data/merge_index/730750818665451459101842416358141509827966271488'
>> with rollover size 1132462.08
>> 23:20:47.050 [info] loading merge_index
>> './data/merge_index/513809169374145557180982949001818249097788784640'
>> 23:20:47.054 [info] opened buffer
>> './data/merge_index/513809169374145557180982949001818249097788784640/buffer.1'
>> 23:20:47.055 [info] finished loading merge_index
>> './data/merge_index/513809169374145557180982949001818249097788784640'
>> with rollover size 975175.6799999999
>> </snip>
>> riak_core_vnode_manager:force_handoffs() does not produce any output on
>> any console on any node besides "OK". No tasty handover log messages to
>> be found.
>> Furthermore I'm not sure what to make of the output from riak-admin
>> transfers:
>> 't...@qbkpxadmin01.ad.qnet.local' waiting to handoff 62 partitions
>> 'qbkp...@qbkpx03.ad.qnet.local' waiting to handoff 42 partitions
>> 'qbkp...@qbkpx01.ad.qnet.local' waiting to handoff 42 partitions
>> Our second node (qbkpx02) is missing from that list. The output also
>> states that the new node (test) wants to handoff 62 partitions although
>> it is the owner of 0 partitions.
>> riak-admin ring_status lists various pending ownership handoffs, all of
>> them are between our 3 original nodes. The new node is not mentioned
>> anywhere.
>> I'm really curious about the current state of our cluster. It does look
>> rather exciting :)
>> /F
>> ------------------------------------------------------------------------
>> *From:* Aphyr [ap...@aphyr.com]
>> *Sent:* Wednesday, January 18, 2012 11:15 PM
>> *To:* Fredrik Lindström
>> *Cc:* riak-users@lists.basho.com
>> *Subject:* Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster
>> Did you try riak_core_ring_manager:force_update() and force_handoffs()
>> on the old partition owner as well as the new one? Can't recall off the
>> top of my head which one needs to execute that handoff.
>> --Kyle
>> On Jan 18, 2012, at 2:08 PM, Fredrik Lindström wrote:
>>> Thanks for the response Aphyr.
>>> I'm seeing Waiting on:
>>> [riak_search_vnode,riak_kv_vnode,riak_pipe_vnode] instead of [] so I'm
>>> thinking it's a different scenario.
>>> It might be worth mentioning that the data directory on the new node
>>> does contain relevant subdirectories but the disk footprint is so
>>> small I doubt any data has been transferred.
>>> /F
>>> ------------------------------------------------------------------------
>>> *From:*Aphyr [ap...@aphyr.com]
>>> *Sent:*Wednesday, January 18, 2012 10:46 PM
>>> *To:*Fredrik Lindström
>>> *Cc:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
>>> *Subject:*Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster
>>> https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org
>>> <https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTESorg>
>>> If partition transfer is blocked awaiting [] (as opposed to [kv_vnode]
>>> or whatever), There's a snippet in there that might be helpful.
>>> --Kyle
>>> On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote:
>>>> After some digging I found a suggestion from Joseph Blomstedt in an
>>>> earlier mail thread
>>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-January/007116.html
>>>> in the riak console:
>>>> riak_core_ring_manager:force_update().
>>>> riak_core_vnode_manager:force_handoffs().
>>>> The symptoms would appear to be the same although the cluster
>>>> referenced in the mail thread does not appear to have search enabled,
>>>> as far as I can tell from the log snippets. The mail thread doesn't
>>>> really specify which node to run the commands on so I tried both the
>>>> new node and the current claimant of the cluster.
>>>> Sadly the suggested steps did not produce any kind of ownership handoff.
>>>> Any helpful ideas would be much appreciated :)
>>>> /F
>>>> ------------------------------------------------------------------------
>>>> *From:*riak-users-boun...@lists.basho.com
>>>> <mailto:riak-users-boun...@lists.basho.com>[riak-users-boun...@lists.basho.com]
>>>> on behalf of Fredrik Lindström [fredrik.lindst...@qbranch.se]
>>>> *Sent:*Wednesday, January 18, 2012 4:00 PM
>>>> *To:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
>>>> *Subject:*Pending transfers when joining 1.0.3 node to 1.0.0 cluster
>>>> Hi everyone,
>>>> when we try to join a 1.0.3 node to an existing 1.0.0 (3 node)
>>>> cluster the ownership transfer doesn't appear to take place. I'm
>>>> guessing that we're making some stupid little mistake but we can't
>>>> figure it out at the moment. Anyone run into something similar?
>>>> Riak Search is enabled on the original nodes in the cluster as well
>>>> as the new node.
>>>> Ring size is set to 128
>>>> The various logfiles do not appear to contain any errors or warnings
>>>> Output from riak-admin member_status
>>>> ================================= Membership
>>>> ==================================
>>>> Status Ring Pending Node
>>>> -------------------------------------------------------------------------------
>>>> valid 33.6% 25.0% 'qbkp...@qbkpx01.ad.qnet.local
>>>> <mailto:'qbkp...@qbkpx01.ad.qnet.local>'
>>>> valid 33.6% 25.0% 'qbkp...@qbkpx02.ad.qnet.local
>>>> <mailto:'qbkp...@qbkpx02.ad.qnet.local>'
>>>> valid 32.8% 25.0% 'qbkp...@qbkpx03.ad.qnet.local
>>>> <mailto:'qbkp...@qbkpx03.ad.qnet.local>'
>>>> valid 0.0% 25.0% 't...@qbkpxadmin01.ad.qnet.local
>>>> <mailto:'t...@qbkpxadmin01.ad.qnet.local>'
>>>> -------------------------------------------------------------------------------
>>>> Output from riak-admin ring_status
>>>> See attached file
>>>> Output from riak-admin transfers
>>>> 't...@qbkpxadmin01.ad.qnet.local
>>>> <mailto:'t...@qbkpxadmin01.ad.qnet.local>' waiting to handoff 10
>>>> partitions
>>>> 'qbkp...@qbkpx03.ad.qnet.local
>>>> <mailto:'qbkp...@qbkpx03.ad.qnet.local>' waiting to handoff 62
>>>> partitions
>>>> 'qbkp...@qbkpx01.ad.qnet.local
>>>> <mailto:'qbkp...@qbkpx01.ad.qnet.local>' waiting to handoff 63
>>>> partitions
>>>> /F
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Joseph Blomstedt <j...@basho.com>
Software Engineer
Basho Technologies, Inc.

riak-users mailing list

Reply via email to