Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

Aphyr Wed, 18 Jan 2012 14:48:05 -0800

Hmm. I can tell you that *typically* we see riak-admin transfers showmany partitions awaiting transfer. If you run the transfers command itresets the timer for transfers to complete, so don't do it too often.The total number of partitions awaiting transfer should slowly decrease.

When zero partitions are waiting to hand off, then you may seeriak-admin ring_status waiting to finish ownership changes. Sometimes itgets stuck on [riak_kv_vnode], in which case force-handoffs seems to dothe trick. Then it can *also* get stuck on [], and then the long snippetI linked to does the trick.

So: give it 15 minutes, and check to see if fewer partitions areawaiting transfer. If you're eager, you can watch the logs for handoffmessages or iptraf that sucker to see the handoff network trafficdirectly; it runs on a distinct port IIRC so it's easy to track.


--Kyle

On 01/18/2012 02:40 PM, Fredrik Lindström wrote:

I just ran the two commands on all 4 nodes.

When run on one of the original nodes the first
command(riak_core_ring_manager:force_update()) resultsin output like the
following in the console of the new node
<snip>
23:20:06.928 [info] loading merge_index
'./data/merge_index/331121464707782692405522344912282871640797216768'
23:20:06.929 [info] opened buffer
'./data/merge_index/331121464707782692405522344912282871640797216768/buffer.1'
23:20:06.929 [info] finished loading merge_index
'./data/merge_index/331121464707782692405522344912282871640797216768'
with rollover size 912261.12
23:20:07.006 [info] loading merge_index
'./data/merge_index/730750818665451459101842416358141509827966271488'
23:20:07.036 [info] opened buffer
'./data/merge_index/730750818665451459101842416358141509827966271488/buffer.1'
23:20:07.036 [info] finished loading merge_index
'./data/merge_index/730750818665451459101842416358141509827966271488'
with rollover size 1132462.08
23:20:47.050 [info] loading merge_index
'./data/merge_index/513809169374145557180982949001818249097788784640'
23:20:47.054 [info] opened buffer
'./data/merge_index/513809169374145557180982949001818249097788784640/buffer.1'
23:20:47.055 [info] finished loading merge_index
'./data/merge_index/513809169374145557180982949001818249097788784640'
with rollover size 975175.6799999999
</snip>

riak_core_vnode_manager:force_handoffs() does not produce any output on
any console on any node besides "OK". No tasty handover log messages to
be found.

Furthermore I'm not sure what to make of the output from riak-admin
transfers:
't...@qbkpxadmin01.ad.qnet.local' waiting to handoff 62 partitions
'qbkp...@qbkpx03.ad.qnet.local' waiting to handoff 42 partitions
'qbkp...@qbkpx01.ad.qnet.local' waiting to handoff 42 partitions

Our second node (qbkpx02) is missing from that list. The output also
states that the new node (test) wants to handoff 62 partitions although
it is the owner of 0 partitions.

riak-admin ring_status lists various pending ownership handoffs, all of
them are between our 3 original nodes. The new node is not mentioned
anywhere.

I'm really curious about the current state of our cluster. It does look
rather exciting :)

/F
------------------------------------------------------------------------
*From:* Aphyr [ap...@aphyr.com]
*Sent:* Wednesday, January 18, 2012 11:15 PM
*To:* Fredrik Lindström
*Cc:* riak-users@lists.basho.com
*Subject:* Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

Did you try riak_core_ring_manager:force_update() and force_handoffs()
on the old partition owner as well as the new one? Can't recall off the
top of my head which one needs to execute that handoff.

--Kyle

On Jan 18, 2012, at 2:08 PM, Fredrik Lindström wrote:

Thanks for the response Aphyr.

I'm seeing Waiting on:
[riak_search_vnode,riak_kv_vnode,riak_pipe_vnode] instead of [] so I'm
thinking it's a different scenario.
It might be worth mentioning that the data directory on the new node
does contain relevant subdirectories but the disk footprint is so
small I doubt any data has been transferred.

/F
------------------------------------------------------------------------
*From:*Aphyr [ap...@aphyr.com]
*Sent:*Wednesday, January 18, 2012 10:46 PM
*To:*Fredrik Lindström
*Cc:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
*Subject:*Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTES.org
<https://github.com/basho/riak/blob/riak-1.0.2/RELEASE-NOTESorg>

If partition transfer is blocked awaiting [] (as opposed to [kv_vnode]
or whatever), There's a snippet in there that might be helpful.

--Kyle

On Jan 18, 2012, at 1:43 PM, Fredrik Lindström wrote:

After some digging I found a suggestion from Joseph Blomstedt in an
earlier mail thread
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-January/007116.html

in the riak console:
riak_core_ring_manager:force_update().
riak_core_vnode_manager:force_handoffs().

The symptoms would appear to be the same although the cluster
referenced in the mail thread does not appear to have search enabled,
as far as I can tell from the log snippets. The mail thread doesn't
really specify which node to run the commands on so I tried both the
new node and the current claimant of the cluster.

Sadly the suggested steps did not produce any kind of ownership handoff.

Any helpful ideas would be much appreciated :)

/F


------------------------------------------------------------------------
*From:*riak-users-boun...@lists.basho.com
<mailto:riak-users-boun...@lists.basho.com>[riak-users-boun...@lists.basho.com]
on behalf of Fredrik Lindström [fredrik.lindst...@qbranch.se]
*Sent:*Wednesday, January 18, 2012 4:00 PM
*To:*riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
*Subject:*Pending transfers when joining 1.0.3 node to 1.0.0 cluster

Hi everyone,
when we try to join a 1.0.3 node to an existing 1.0.0 (3 node)
cluster the ownership transfer doesn't appear to take place. I'm
guessing that we're making some stupid little mistake but we can't
figure it out at the moment. Anyone run into something similar?

Riak Search is enabled on the original nodes in the cluster as well
as the new node.
Ring size is set to 128

The various logfiles do not appear to contain any errors or warnings

Output from riak-admin member_status
================================= Membership
==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 33.6% 25.0% 'qbkp...@qbkpx01.ad.qnet.local
<mailto:'qbkp...@qbkpx01.ad.qnet.local>'
valid 33.6% 25.0% 'qbkp...@qbkpx02.ad.qnet.local
<mailto:'qbkp...@qbkpx02.ad.qnet.local>'
valid 32.8% 25.0% 'qbkp...@qbkpx03.ad.qnet.local
<mailto:'qbkp...@qbkpx03.ad.qnet.local>'
valid 0.0% 25.0% 't...@qbkpxadmin01.ad.qnet.local
<mailto:'t...@qbkpxadmin01.ad.qnet.local>'
-------------------------------------------------------------------------------

Output from riak-admin ring_status
See attached file

Output from riak-admin transfers
't...@qbkpxadmin01.ad.qnet.local
<mailto:'t...@qbkpxadmin01.ad.qnet.local>' waiting to handoff 10
partitions
'qbkp...@qbkpx03.ad.qnet.local
<mailto:'qbkp...@qbkpx03.ad.qnet.local>' waiting to handoff 62 partitions
'qbkp...@qbkpx01.ad.qnet.local
<mailto:'qbkp...@qbkpx01.ad.qnet.local>' waiting to handoff 63 partitions


/F


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Pending transfers when joining 1.0.3 node to 1.0.0 cluster

Reply via email to