Understanding Riaks rebalancing and handoff behaviour

Sven Riedel Tue, 09 Nov 2010 07:09:01 -0800

Hi,
I'm currently assessing how well riak fits our needs as a large scale data 
store.


In the course of testing riak, I've set up a cluster in Amazons with 6 nodes 
across two EC2 instances (m2.xlarge). After seeing surprisingly a surprisingly 
bad write performance (which I'll write more on in a separate post once I've 
finished my tests), I wanted to migrate the cluster to instances with a better 
IO performance.

Lets call the original EC2 instances A and B. The plan was to migrate the 
cluster to new EC2 instances called C and D. During the following actions no 
other processes were reading/writing from/to the cluster. All instances are in 
the same availability zone.

What I did so far was to tell all riak nodes on B to leave the ring and let the 
ring re-stabilize. One surprising behaviour here was that the riak nodes on A 
suddenly all went into deep sleep mode (process state D) for about 30 minutes, 
and all riak-admin status/transfer calls claimed all nodes were down when in 
fact they weren't and were quite busy. But left to themselves they sorted 
everything out in the end.

Then I set up 3 new riak nodes on C and told them to join the cluster.

So far everything went well. riak-admin transfers showed me that both the nodes 
on A and the nodes on C were waiting on/for handoffs. However, the handoffs 
didn't start. I gave the cluster an hour, but no data transfer got initiated to 
the new nodes. 

Since I didn't find any way to manually trigger the handoff, I told all the 
nodes on A (riak01, riak02 and riak03) to leave the cluster and after the last 
node on A left the ring, the handoffs started.
After all the data in riak01 got moved to the nodes on C, the master process 
shut down and the handoff for the remaining data from riak02 and riak03 
stopped. I tried restarting riak01 manually, however riak-admin ringready 
claims that riak01 and riak04 (on C) disagree on the partition owners. 
riak-admin transfers still lists the same amount of partitions awaiting handoff 
as when the the handoff to the nodes on C started.

My current data distribution is as follows (via du -c):
On A:
1780 riak01/data
188948 riak02/data
3766736 riak03/data

On B:
13215908 riak04/data
1855584 riak05/data
5745076 riak06/data

riak04 and riak05 are awaiting the handoff of 341 partitions, riak06 of 342 
partitions.

The ring_creation_size is 512, n_val for the bucket is 3, w is set to 1.

My questions at this point are:
1. What would normally trigger a rebalancing of the nodes? 
2. Is there a way to manually trigger a rebalancing?
3. Did I do anything wrong with the procedure described above to be left in the 
current odd state by riak?
4. How would I rectify this situation in a production environment?

Regards,
Sven

------------------------------------------
Scoreloop AG, Brecherspitzstrasse 8, 81541 Munich, Germany, www.scoreloop.com
sven.rie...@scoreloop.com

Sitz der Gesellschaft: München, Registergericht: Amtsgericht München, HRB 
174805 
Vorstand: Dr. Marc Gumpinger (Vorsitzender), Dominik Westner, Christian van der 
Leeden, Vorsitzender des Aufsichtsrates: Olaf Jacobi 


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Understanding Riaks rebalancing and handoff behaviour

Reply via email to