As you may have noticed, this week on the blog I've been tackling the deeper meanings of various behavioral configuration parameters, ranging from ye olde r/w parameters to the much more obscure basic_quorum.
After posting today's missive, the esteemed Andrew Thompson noticed that something I documented was no longer true in v1.3.1, and after some discussions we realized that this change had implications that needed to be shared with the community. For those unfamiliar: dw is short for durable write, so setting dw values for a bucket or request indicates how many nodes should have the data saved to the backend (typically bitcask or leveldb) before the client is sent a response. tl;dr == If you set w=1 for performance reasons, make sure you also set dw=1. Slightly longer version == Until 1.3.1, dw (durable write) would be implicitly demoted to have the same value as w when w was smaller. This is a reasonable optimization for the w=1 case (dw defaults to quorum, despite what you may have read on docs.basho.com) but a very unreasonable behavior when someone explicitly asked for dw=3 without also asking for w=3. Now in 1.3.1 dw will be 1 (at a minimum), 2 (by default), and 3 (if requested) no matter what value is set for w. Cross-referenced version == Read http://basho.com/understanding-riaks-configurable-behaviors-part-1/ and http://basho.com/riaks-config-behaviors-part-2/; the latter should be updated today to reflect the 1.3.1 behavior. Also check back on the blog (http://basho.com/blog) later this week for 2 more posts in the series. I think you'll enjoy them. The really long version == (Actually, this is somewhat tangential to the original point and shorter than my blog posts, so it's really the longish pedantic version.) This explanation assumes default behaviors, such as vnode_vclocks=true and n_val=3. Vnode-based vector clocks are the defining behavioral characteristic that makes this flow what it is. When a write request arrives at the coordinating node, contrary to what one might expect it is not immediately sent to the other 2 nodes with responsibility over the key. Instead, the request is handed to the local vnode mapped to that key, and until the vnode replies back with a new vector clock, nothing else happens. So, the approximate sequence of events: 1 Coordinating node receives request 2 Request is forwarded to local vnode 3 Local vnode replies with "w" message to the coordinating node indicating it has received the request 4 Local vnode creates a new vector clock based on the vclock received with the request, if any, and possibly impacted by any existing object with the same key 5 Local vnode sends the new object to the backend 6 Local vnode replies with "dw" message and new object to the coordinating node 7 If w=1 and dw=1, /now/ the coordinating node replies to the client, with the new vclock if requested by the client 8 The coordinating node sends the new object with new vclock to the remote vnodes that also own the key 9 Each vnode will reply with a "w" message upon receipt 10 Each vnode will reply with a "dw" message upon sending the object to its backend 11 If w>1 or dw>1, the coordinating node replies to the client once it has received enough successful replies from the remote vnodes to meet those values (This is why it's not meaningful, with vnode_vclocks=true, to set dw=0. It has a minimum effective value of 1, regardless of what the client or operator wishes, because the first vnode must construct a new vector clock and store the object to disk before the client can ever receive a response.) And as you can see, all activity before the reply to the client is local to the coordinating node when w=1 and dw=1, and the response can be sent back to the client before the request is forwarded to other nodes. Prior to 1.3.1, dw would be effectively 1 if w was set to 1. Now, with 1.3.1, both w and dw must be set to 1 before that optimal response time can be achieved. -John _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com