Hi Soren, I forgot to reply-all so I'm resending to keep the discussion on the mailing list.
First reply: A few months back we added options on the master branch to the get/put FSM to control how many *primary* nodes (partition owners) are required for each get and put operation (PR and PW for get/put respectively). Iff you set PR+PW > N then you should get the behavior you desire in point 1. Otherwise there's a chance that your write could go to (possibly different) fallback nodes which would *eventually* deliver the result back to the current owner. In the case of a locking mechanism, eventual consistency is undesirable. For (2) nothing protects you from concurrent writes - two puts could be coordinated at the same time and because multiple nodes are involved (well, vnodes) there is no guarantee of the ordering. Being prepared to handle conflicts is a much safer strategy. You may be able to use the returnbody option in combination with PR/PW to check if you got what you wanted. If the result you receive after put only contains the client that locked it, treat it as success. Otherwise the lock failed. I'm not quite sure how you would clean up on collisions, as it's possible that both clients could lock. Even a solution of 'lowest sorted client wins' wouldn't necessarily work if a client with a higher id successfully locked then a lower created a conflict. If you can work out a way to reliably clean up, you may be onto something. Follow ups from Soren: Referencing (2) - That was what I was trying to achieve by simply doing allow_mult=false and reading back the value afterwards. Thinking about this some more, I guess I'm still a bit stuck in my SQL thinking and figured that setting dw=n would somehow synchronise the write across the different nodes, but I gather that's not actually the case (since that would imply some kind of transaction rollback mechanism)? As you correctly observe - there are no transactions. DW guarantees that the storage backend has accepted the write on that many nodes - depending on how you've configured the backend, operating system and disk controllers/disks it may mean the data has been written to the disk. On failure, no transactional rollbacks occur. A consequence of this is that a write can fail if it does not meet the PW/W/DW requirements but still be written to some vnodes. A later read can return the 'failed' value. I wish we had made the returns be 'ok' or 'maybe', instead of {'error', Blah}. I seem to have not understood how returnbody works under the hood. > Does it essentially perform a GET operation after finishing the PUT? I > was afraid it would return whatever I had passed in my PUT request, so > effectively masking it if there were conflicts when attempting the > write. Whenever Riak does a put it retrieves the current object on each vnode, updates it with the incoming value - either replacing it or creating siblings depending on the vclock then writes the object. Returnbody returns that written value to the put FSM saving a call get. It's just an optimization really, you could issue a separate get. Did you miss the last part of my algorithm where I read back the value > after the supposed succesful write for verification? If not, how is > what you're suggesting different? > I suggested writing the client id with allow_mult=false and w=dw=n, > and then reading it back again afterwards (with r=n). If I didn't find > the value in there I expect, start over (with the PUT with > If-None-Match: * etc.) until I succesfully PUT and then READ back my > own ID. Once I'm done with the lock, DELETE it. The clean up problem I was talking about was in the case where two clients try and lock at the same time and neither gets the lock. This can happen depending on the order the vnodes process the put requests. Client1: get returns not found Client2: get returns not found Client1: put(<<"b">>, <<"lock">>, <<"client 1 lock">>) while Client2: put(<<"b">>, <<"lock">>, <<"client 2 lock">>) Client1: get(<<"b">>, <<"lock">>) returns [<<"client1 lock">>] Client2: get(<<"b">>, <<"lock">>) returns sibings [<<"client1 lock">>, <<"client2 lock">>] So client1 gets the lock, and client 2 sees it failed. If the ordering of node message delivery is different both clients may succeed. If-None-Match is implemented as get to check the vclock followed by a put in the HTTP/PBC interfaces, there is still a chance of creating a conflict when both if-none-match gets complete before the puts. Client1: get returns not found Client2: get returns not found Client1: put(<<"b">>, <<"lock">>, <<"client 1 lock">>) while Client2: put(<<"b">>, <<"lock">>, <<"client 2 lock">>) Client1: get(<<"b">>, <<"lock">>) returns sibings [<<"client1 lock">>, <<"client2 lock">>] Client2: get(<<"b">>, <<"lock">>) returns sibings [<<"client1 lock">>, <<"client2 lock">>] In this case neither client gets the lock. Notice that client2 gets the same value as when client1 succeeds. This makes knowing when to delete to retry the lock very hard. Setting allow_mult=false makes things worse as it will just chose the sibling with the latest timestamp and because the If-None-Match is not atomic it will hide what really happened. The In-None-Match behavior isn't ideal. I've been thinking about pushing the If-None-Match option down to the vnode level but I don't think it resolves all cases, it needs some thought. Best Regards, Jon Meredith Senior Software Engineer Basho Technologies On Sun, Jul 31, 2011 at 2:55 PM, Soren Hansen <so...@linux2go.dk> wrote: > I've seen a couple of posts here and there on the subject of a locking > mechanism for Riak, most notably: > > http://riak-users.197444.n3.nabble.com/Riak-and-Locks-td1866960.html > > While it would only serve as an advisory locking mechanism, wouldn't a > bucket with a reasonably high n, w and dw set equal to n, a > deterministic naming scheme for the object being locked, and a locking > algorithm such as: > > 1. PUT /locks/object_id > If-None-Match: * > Body: <some unique identifier for the client thread> > > 1a. If this fails, wait for a while, then try again. > 1b. If it succeeds, proceed to 2. > > 2. The doc for If-None-Match says "this does not prevent concurrent > writes; it is possible for the condition to evaluate to true for > multiple requests if the requests occur at the same time." I'm not > completely sure if n=w=dw protects me from concurrent writes (I'm not > familiar with the locking semantics of a single riak instance). > Anyway, if I'm in fact not protected, the next step is to read the > value back to make sure we're actually the ones holding the key. If > not, go back to step 1. If yes, proceed as planned. > > 3. Once you're done with the lock, just DELETE it. > > If this were really that simple, someone would have suggested it. So, > what is this Riak rookie (i.e. I) missing? > > > -- > Soren Hansen | http://linux2go.dk/ > Ubuntu Developer | http://www.ubuntu.com/ > OpenStack Developer | http://www.openstack.org/ > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com