Given that 1.0 prerelease packages are now available, I wanted to
mention some changes to Riak's clustering capabilities in 1.0. In
particular, there are some subtle semantic differences in the
riak-admin commands. More complete docs will be updated in the near
future, but I hope a quick email suffices for now.

[nodeB/riak-admin join nodeA] is now strictly one-way. It joins nodeB
to the cluster that nodeA is a member of. This is semantically
different than pre-1.0 Riak in which join essentially joined clusters
together rather than joined a node to a cluster. As part of this
change, the joining node (nodeB in this case) must be a singleton
(1-node) cluster.

In pre-1.0, leave and remove were essentially the same operation, with
leave just being an alias for 'remove this-node'. This has changed.
Leave and remove are now very different operations.

[nodeB/riak-admin leave] is the only safe way to have a node leave the
cluster, and it must be executed by the node that you want to remove.
In this case, nodeB will start leaving the cluster, and will not leave
the cluster until after it has handed off all its data. Even if nodeB
is restarted (crashed/shutdown/whatever), it will remain in the leave
state and continue handing off partitions until done. After handoff,
it will leave the cluster, and eventually shutdown.

[nodeA/riak-admin remove nodeB] immediately removes nodeB from the
cluster, without handing off its data. All replicas held by nodeB are
therefore lost, and will need to be re-generated through read-repair.
Use this command carefully. It's intended for nodes that are
permanently unrecoverable and therefore for which handoff doesn't make
sense. By the final 1.0 release, this command may be renamed
"force-remove" just to make the distinction clear.

There are now two new commands that provide additional insight into
the cluster. [riak-admin member_status] and [riak-admin ring_status].

Underneath, the clustering protocol has been mostly re-written. The
new approach has the following advantages:
1. It is no longer necessary to wait on [riak-admin ringready] in
between adding/removing nodes from the cluster, and adding/removing is
also much more sound/graceful. Starting up 16 nodes and issuing
[nodeX: riak-admin join node1] for X=1:16 should just work.

2. Data is first transferred to new partition owners before handing
over partition ownership. This change fixes numerous bugs, such as
404s/not_founds during ownership changes. The Ring/Pending columns in
[riak-admin member_status] visualize this at a high-level, and the
full transfer status in [riak-admin ring_status] provide additional
insight.

3. All partition ownership decisions are now made by a single node in
the cluster (the claimant). Any node can be the claimant, and the duty
is automatically taken over if the previous claimant is removed from
the cluster. [riak-admin member_status] will list the current
claimant.

4. Handoff related to ownership changes can now occur under load;
hinted handoff still only occurs when a vnode is inactive. This change
allows a cluster to scale up/down under load, although this needs to
be further benchmarked and tuned before 1.0.

To support all of the above, a new limitation has been introduced.
Cluster changes (member addition/removal, ring rebalance, etc) can
only occur when all nodes are up and reachable. [riak-admin
ring_status] will complain when this is not the case. If a node is
down, you must issue [riak-admin down <node>] to mark the node as
down, and the remaining nodes will then proceed to converge as usual.
Once the down node comes back online, it will automatically
re-integrate into the cluster. However, there is nothing preventing
client requests being served by a down node before it re-integrates.
Before issuing [down <node>], make sure to update your load balancers
/ connection pools to not include this node. Future releases of Riak
may make offlining a node an automatic operation, but it's a
user-initiated action in 1.0.

-Joe

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to