Hmm. I can tell you that *typically* we see riak-admin transfers show
many partitions awaiting transfer. If you run the transfers command it
resets the timer for transfers to complete, so don't do it too often.
The total number of partitions awaiting transfer should slowly decrease.
When zero partitions are waiting to hand off, then you may see
riak-admin ring_status waiting to finish ownership changes. Sometimes it
gets stuck on [riak_kv_vnode], in which case force-handoffs seems to do
the trick. Then it can *also* get stuck on [], and then the long snippet
I linked to does the trick.
So: give it 15 minutes, and check to see if fewer partitions are
awaiting transfer. If you're eager, you can watch the logs for handoff
messages or iptraf that sucker to see the handoff network traffic
directly; it runs on a distinct port IIRC so it's easy to track.
--Kyle
On 01/18/2012 02:40 PM, Fredrik Lindström wrote:
I just ran the two commands on all 4 nodes.
When run on one of the original nodes the first
command(riak_core_ring_manager:force_update()) resultsin output like the
following in the console of the new node
<snip>
23:20:06.928 [info] loading merge_index
'./data/merge_index/331121464707782692405522344912282871640797216768'
23:20:06.929 [info] opened buffer
'./data/merge_index/331121464707782692405522344912282871640797216768/buffer.1'
23:20:06.929 [info] finished loading merge_index
'./data/merge_index/331121464707782692405522344912282871640797216768'
with rollover size 912261.12
23:20:07.006 [info] loading merge_index
'./data/merge_index/730750818665451459101842416358141509827966271488'
23:20:07.036 [info] opened buffer
'./data/merge_index/730750818665451459101842416358141509827966271488/buffer.1'
23:20:07.036 [info] finished loading merge_index
'./data/merge_index/730750818665451459101842416358141509827966271488'
with rollover size 1132462.08
23:20:47.050 [info] loading merge_index
'./data/merge_index/513809169374145557180982949001818249097788784640'
23:20:47.054 [info] opened buffer
'./data/merge_index/513809169374145557180982949001818249097788784640/buffer.1'
23:20:47.055 [info] finished loading merge_index
'./data/merge_index/513809169374145557180982949001818249097788784640'
with rollover size 975175.6799999999
</snip>
riak_core_vnode_manager:force_handoffs() does not produce any output on
any console on any node besides "OK". No tasty handover log messages to
be found.
Furthermore I'm not sure what to make of the output from riak-admin
transfers:
't...@qbkpxadmin01.ad.qnet.local' waiting to handoff 62 partitions
'qbkp...@qbkpx03.ad.qnet.local' waiting to handoff 42 partitions
'qbkp...@qbkpx01.ad.qnet.local' waiting to handoff 42 partitions
Our second node (qbkpx02) is missing from that list. The output also
states that the new node (test) wants to handoff 62 partitions although
it is the owner of 0 partitions.
riak-admin ring_status lists various pending ownership handoffs, all of
them are between our 3 original nodes. The new node is not mentioned
anywhere.
I'm really curious about the current state of our cluster. It does look
rather exciting :)
/F
------------------------------------------------------------------------
*From:* Aphyr [ap...@aphyr.com]
*Sent:* Wednesday, January 18, 2012 11:15 PM
*To:* Fredrik Lindström
*Cc:* riak-users@lists.basho.com
*Subject:* Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster
Did you try riak_core_ring_manager:force_update() and force_handoffs()
on the old partition owner as well as the new one? Can't recall off the
top of my head which one needs to execute that handoff.
--Kyle
On Jan 18, 2012, at 2:08 PM, Fredrik Lindström wrote:
Thanks for the response Aphyr.
I'm seeing Waiting on:
[riak_search_vnode,riak_kv_vnode,riak_pipe_vnode] instead of [] so I'm
thinking it's a different scenario.
It might be worth mentioning that the data directory on the new node
does contain relevant subdirectories but the disk footprint is so
small I doubt any data has been transferred.
/F
------------------------------------------------------------------------
*From:*Aphyr [ap...@aphyr.com]
*Sent:*Wednesday, January 18, 2012 10:46 PM
*To:*Fredrik Lindström
*Cc:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
*Subject:*Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster
https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org
<https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTESorg>
If partition transfer is blocked awaiting [] (as opposed to [kv_vnode]
or whatever), There's a snippet in there that might be helpful.
--Kyle
On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote:
After some digging I found a suggestion from Joseph Blomstedt in an
earlier mail thread
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-January/007116.html
in the riak console:
riak_core_ring_manager:force_update().
riak_core_vnode_manager:force_handoffs().
The symptoms would appear to be the same although the cluster
referenced in the mail thread does not appear to have search enabled,
as far as I can tell from the log snippets. The mail thread doesn't
really specify which node to run the commands on so I tried both the
new node and the current claimant of the cluster.
Sadly the suggested steps did not produce any kind of ownership handoff.
Any helpful ideas would be much appreciated :)
/F
------------------------------------------------------------------------
*From:*riak-users-boun...@lists.basho.com
<mailto:riak-users-boun...@lists.basho.com>[riak-users-boun...@lists.basho.com]
on behalf of Fredrik Lindström [fredrik.lindst...@qbranch.se]
*Sent:*Wednesday, January 18, 2012 4:00 PM
*To:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
*Subject:*Pending transfers when joining 1.0.3 node to 1.0.0 cluster
Hi everyone,
when we try to join a 1.0.3 node to an existing 1.0.0 (3 node)
cluster the ownership transfer doesn't appear to take place. I'm
guessing that we're making some stupid little mistake but we can't
figure it out at the moment. Anyone run into something similar?
Riak Search is enabled on the original nodes in the cluster as well
as the new node.
Ring size is set to 128
The various logfiles do not appear to contain any errors or warnings
Output from riak-admin member_status
================================= Membership
==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 33.6% 25.0% 'qbkp...@qbkpx01.ad.qnet.local
<mailto:'qbkp...@qbkpx01.ad.qnet.local>'
valid 33.6% 25.0% 'qbkp...@qbkpx02.ad.qnet.local
<mailto:'qbkp...@qbkpx02.ad.qnet.local>'
valid 32.8% 25.0% 'qbkp...@qbkpx03.ad.qnet.local
<mailto:'qbkp...@qbkpx03.ad.qnet.local>'
valid 0.0% 25.0% 't...@qbkpxadmin01.ad.qnet.local
<mailto:'t...@qbkpxadmin01.ad.qnet.local>'
-------------------------------------------------------------------------------
Output from riak-admin ring_status
See attached file
Output from riak-admin transfers
't...@qbkpxadmin01.ad.qnet.local
<mailto:'t...@qbkpxadmin01.ad.qnet.local>' waiting to handoff 10
partitions
'qbkp...@qbkpx03.ad.qnet.local
<mailto:'qbkp...@qbkpx03.ad.qnet.local>' waiting to handoff 62 partitions
'qbkp...@qbkpx01.ad.qnet.local
<mailto:'qbkp...@qbkpx01.ad.qnet.local>' waiting to handoff 63 partitions
/F
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com