As you may have noticed, this week on the blog I've been tackling the deeper 
meanings of various behavioral configuration parameters, ranging from ye olde 
r/w parameters to the much more obscure basic_quorum.

After posting today's missive, the esteemed Andrew Thompson noticed that 
something I documented was no longer true in v1.3.1, and after some discussions 
we realized that this change had implications that needed to be shared with the 
community.

For those unfamiliar: dw is short for durable write, so setting dw values for a 
bucket or request indicates how many nodes should have the data saved to the 
backend (typically bitcask or leveldb) before the client is sent a response.


tl;dr
==
If you set w=1 for performance reasons, make sure you also set dw=1.


Slightly longer version
==
Until 1.3.1, dw (durable write) would be implicitly demoted to have the same 
value as w when w was smaller. This is a reasonable optimization for the w=1 
case (dw defaults to quorum, despite what you may have read on docs.basho.com) 
but a very unreasonable behavior when someone explicitly asked for dw=3 without 
also asking for w=3.

Now in 1.3.1 dw will be 1 (at a minimum), 2 (by default), and 3 (if requested) 
no matter what value is set for w.


Cross-referenced version
==
Read http://basho.com/understanding-riaks-configurable-behaviors-part-1/ and 
http://basho.com/riaks-config-behaviors-part-2/; the latter should be updated 
today to reflect the 1.3.1 behavior.

Also check back on the blog (http://basho.com/blog) later this week for 2 more 
posts in the series. I think you'll enjoy them.


The really long version
==
(Actually, this is somewhat tangential to the original point and shorter than 
my blog posts, so it's really the longish pedantic version.)

This explanation assumes default behaviors, such as vnode_vclocks=true and 
n_val=3. Vnode-based vector clocks are the defining behavioral characteristic 
that makes this flow what it is.


When a write request arrives at the coordinating node, contrary to what one 
might expect it is not immediately sent to the other 2 nodes with 
responsibility over the key.

Instead, the request is handed to the local vnode mapped to that key, and until 
the vnode replies back with a new vector clock, nothing else happens.

So, the approximate sequence of events:

1  Coordinating node receives request
2  Request is forwarded to local vnode
3  Local vnode replies with "w" message to the coordinating node indicating it 
has received the request
4  Local vnode creates a new vector clock based on the vclock received with the 
request, if any, and possibly impacted by any existing object with the same key
5  Local vnode sends the new object to the backend
6  Local vnode replies with "dw" message and new object to the coordinating node
7  If w=1 and dw=1, /now/ the coordinating node replies to the client, with the 
new vclock if requested by the client
8  The coordinating node sends the new object with new vclock to the remote 
vnodes that also own the key
9  Each vnode will reply with a "w" message upon receipt
10 Each vnode will reply with a "dw" message upon sending the object to its 
backend
11 If w>1 or dw>1, the coordinating node replies to the client once it has 
received enough successful replies from the remote vnodes to meet those values

(This is why it's not meaningful, with vnode_vclocks=true, to set dw=0. It has 
a minimum effective value of 1, regardless of what the client or operator 
wishes, because the first vnode must construct a new vector clock and store the 
object to disk before the client can ever receive a response.)

And as you can see, all activity before the reply to the client is local to the 
coordinating node when w=1 and dw=1, and the response can be sent back to the 
client before the request is forwarded to other nodes.

Prior to 1.3.1, dw would be effectively 1 if w was set to 1. Now, with 1.3.1, 
both w and dw must be set to 1 before that optimal response time can be 
achieved.

-John


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to