Nodes missing primary partitions

Anthony Molinaro Wed, 15 Feb 2012 12:30:48 -0800

Hi,

  I have a 12 node riak cluster running riak 0.14.2.  I had several nodes
crash with OOM errors, and after restarting them I see the following when
running riak-admin transfers


Attempting to restart script through sudo -u riak
'[email protected]' waiting to handoff 1 partitions
'[email protected]' does not have 1 primary partitions running
'[email protected]' waiting to handoff 1 partitions
'[email protected]' does not have 1 primary partitions running
'[email protected]' waiting to handoff 1 partitions
'[email protected]' does not have 1 primary partitions running

The only errors in the whole cluster are 2 errors on 10.5.10.30, both of
the form


=ERROR REPORT==== 15-Feb-2012::17:49:38 ===
Handoff receiver for partition
1044745311060762632934665329637030439832170528768
exiting abnormally after processing 7 objects:
{timeout,
 {gen_fsm,
  sync_send_all_state_event,
  [<0.1299.1>,
   {handoff_data,
    
<<141,144,203,78,2,49,20,134,207,12,76,4,77,12,49,38,38,174,221,54,25,102,64,134,189,151,141,168,33,6,217,145,127,218,142,157,138,29,3,229,9,120,3,119,190,133,91,151,238,92,243,68,182,74,162,184,162,77,78,206,245,59,253,187,27,204,15,78,178,110,154,138,62,151,44,73,58,49,139,139,172,199,210,12,9,227,167,73,12,153,246,179,162,43,143,95,107,75,181,35,168,49,155,84,185,150,220,62,17,81,48,247,118,171,249,169,111,87,53,65,205,217,132,87,198,74,99,85,83,80,93,148,220,34,68,203,221,6,110,17,171,150,254,119,84,240,55,247,205,241,230,200,175,222,27,179,97,137,71,54,186,195,3,122,24,59,66,31,6,23,232,224,25,9,46,113,239,162,129,243,135,46,51,69,14,141,54,82,156,109,144,2,79,58,92,147,174,48,183,108,80,137,178,40,165,80,181,156,40,106,231,20,209,75,78,225,232,195,141,249,230,214,242,71,78,136,243,252,230,154,175,214,48,21,250,98,157,150,246,221,149,2,19,209,74,191,125,238,235,95,161,193,246,66,55,159,40,40,226,83,9,227,64,118,182,144,90,187,143,92,24,33,139,210,72,241,5>>},60000]}}

=ERROR REPORT==== 15-Feb-2012::17:49:41 ===
Handoff receiver for partition
1044745311060762632934665329637030439832170528768
exiting abnormally after processing 7 objects: 
{timeout,
 {gen_fsm,
  sync_send_all_state_event,
  [<0.1299.1>,
   {handoff_data,
    
<<141,144,75,78,195,48,16,134,39,105,2,41,72,168,2,36,36,214,108,88,88,74,232,35,112,128,34,22,45,32,132,16,130,69,245,59,118,112,210,226,208,36,101,193,182,27,14,193,33,184,0,123,142,133,13,149,160,172,234,209,140,172,121,124,227,223,27,78,181,125,16,71,224,113,24,167,44,13,123,156,197,221,78,155,161,23,113,150,182,79,142,142,35,36,157,94,39,222,127,107,204,213,186,160,160,28,21,60,151,73,253,72,68,78,101,227,74,243,19,219,174,26,130,154,229,40,41,116,45,117,173,154,130,60,145,37,53,92,180,140,5,184,68,168,90,249,191,163,156,191,185,111,142,13,123,118,245,230,45,187,202,48,102,55,215,120,64,23,5,198,198,43,115,215,40,241,132,33,6,120,49,126,10,133,115,72,244,49,53,89,99,75,36,199,146,118,23,164,1,170,154,13,11,145,165,153,20,170,193,137,252,136,147,79,103,156,214,158,61,51,102,155,119,230,63,114,92,83,110,222,241,139,254,97,176,224,41,215,214,61,239,254,35,84,46,28,237,211,107,254,254,185,149,255,106,117,86,215,186,252,74,65,126,50,145,208,6,84,151,51,153,231,230,47,103,90,200,52,211,82,124,1>>},60000]}}

I tried strobing through restarting all nodes, which seemed temporarily
fix this particular node, but then I think this error cropped up.

If there's anything I can try or more information I can give let me know.
The boxes are 16 core, 24 GB memory, with data in bitcask on an SSD drive,
there are 1024 partitions spread across 12 machines.  Each machine does
roughly 55-120K vnode gets per second, 20-40K node gets per second, 1-2K
 vnode puts, and 1-2K node puts.

Thanks for the help,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <[email protected]>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Nodes missing primary partitions

Reply via email to