Re: Clarifying "Read-before-Write"

2011-11-26 Thread Russell Brown

On 26 Nov 2011, at 01:14, Andres Jaan Tack wrote:

> So I was just reading and thinking about this, and I don't understand the 
> advice offered under "Read-before-Write" at 
> http://wiki.basho.com/Client-Implementation-Guide.html.
> 
> "Riak will return an encoded vector clock with every "fetch" or "read" 
> request that does not result in a "not found" response. In addition to the 
> Client ID, this vector clock tells Riak how to resolve concurrent writes, 
> essentially representing the "last seen" version of the object to which the 
> client made modifications. In order to prevent sibling explosion, clients 
> should always have a vector clock before sending a write, and send the vector 
> clock as part of the write request. Therefore, it is essential that keys are 
> fetched before being written (except in the case where Riak selects the key 
> or there is a priori knowledge that the key is new). Client libraries that 
> make this automatic will reduce operational issues by limiting sibling 
> explosion. Clients may also choose to perform automatic Sibling Resolution on 
> read."
>  
> I'm having trouble understanding the advice. I get that if I'm aware of all 
> the siblings, I can resolve them (optionally) with that vector clock. What I 
> don't understand here: If an application PUTs to an object out of the blue, 
> not having read it first, should the client library read-before-write?

Yes it should.

> This seems like a great way to blow away siblings by accident. 

But it should never do that, if siblings are encountered, it should *do* 
something.

> Or is the point rather to avoid sibling explosion for applications that don't 
> care about losing information?

A well behaved client library will not blindly PUT a value "over the top" of 
siblings, but will push the problem to the library user (hopefully in some 
helpful way, like automatically applying some domain specific resolution 
logic.) 

So, in the case of the Java client, when you store (or fetch for that matter) 
you must provide an implementation of the ConflictResolver interface to the 
client, this will then be executed to resolve any siblings on the pre-store 
fetch. If you don't provide a conflict resolver the Java client uses one that 
throws a runtime exception when it encounters siblings on fetch, exactly so 
that you don't do as you describe, and blow away potentially meaningful sibling 
values.

Maybe the wording on the wiki should make this clearer, maybe it should read:

"Clients [that automatically fetch before store] _must_ chose to either perform 
automatic Sibling Resolution *or* abort the write and notify the presence of 
siblings to the caller"

It is a thorny issue, please let me know if I've answered your question 
adequately.

Cheers

Russell

> 
> --
> Andres
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Clarifying "Read-before-Write"

2011-11-26 Thread Andres Jaan Tack
Thanks! That explanation is perfect. I guess should have taken a look at
some of the other clients as an example in the first place.

Now I have something to fix for Riak-Cpp. :)

--
Andres

2011/11/26 Russell Brown 

>
> On 26 Nov 2011, at 01:14, Andres Jaan Tack wrote:
>
> So I was just reading and thinking about this, and I don't understand the
> advice offered under "Read-before-Write" at
> http://wiki.basho.com/Client-Implementation-Guide.html.
>
> "Riak will return an encoded vector 
> clock
>>  with every "fetch" or "read" request that does not result in a "not
>> found" response. In addition to the Client ID, this vector clock tells Riak
>> how to resolve concurrent writes, essentially representing the "last seen"
>> version of the object to which the client made modifications. In order to
>> prevent sibling 
>> explosion,
>> clients should always have a vector clock before sending a write, and send
>> the vector clock as part of the write request. Therefore, it is essential
>> that keys are fetched before being written (except in the case where Riak
>> selects the key or there is *a priori* knowledge that the key is new).
>> Client libraries that make this automatic will reduce operational issues by
>> limiting sibling explosion. Clients may also choose to perform automatic 
>> Sibling
>> Resolution
>>  on read."
>
>
> I'm having trouble understanding the advice. I get that if I'm aware of
> all the siblings, I can resolve them (optionally) with that vector clock.
> What I don't understand here: If an application PUTs to an object out of
> the blue, not having read it first, should the client library
> read-before-write?
>
>
> Yes it should.
>
> This seems like a great way to blow away siblings by accident.
>
>
> But it should never do that, if siblings are encountered, it should *do*
> something.
>
> Or is the point rather to avoid sibling explosion for applications that
> don't care about losing information?
>
>
> A well behaved client library will not blindly PUT a value "over the top"
> of siblings, but will push the problem to the library user (hopefully in
> some helpful way, like automatically applying some domain specific
> resolution logic.)
>
> So, in the case of the Java client, when you store (or fetch for that
> matter) you must provide an implementation of the ConflictResolver
> interface to the client, this will then be executed to resolve any siblings
> on the pre-store fetch. If you don't provide a conflict resolver the Java
> client uses one that throws a runtime exception when it encounters siblings
> on fetch, exactly so that you don't do as you describe, and blow away
> potentially meaningful sibling values.
>
> Maybe the wording on the wiki should make this clearer, maybe it should
> read:
>
> "Clients [that automatically fetch before store] _must_ chose to either
> perform automatic Sibling Resolution *or* abort the write and notify the
> presence of siblings to the caller"
>
> It is a thorny issue, please let me know if I've answered your question
> adequately.
>
> Cheers
>
> Russell
>
>
> --
> Andres
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Setting up a project on top of Riak

2011-11-26 Thread Mathieu D'Amours
Hi all,

I'm sorry if my question sounds silly, but I'm not quite use to the OTP way of 
developing an application. There seems to be something I'm not seeing in all 
that.

We're going to develop an application that will use Riak Core, KV, Luwak and 
Search: We might need to create a new storage backend for Riak KV, we will 
design a particular distributed processing scheme using Riak Core and will 
build an HTTP interface using webmachine.

For the moment, we need to build modules, test suites and all that, and those 
are closely tied to the different Erlang components I just talk about, like 
pretty much all the Riak components. So I need those external components to 
work as well as if I installed them using the prepared Riak package found at 
http://github.com/basho/riak . If I use rebar and the OTP way of building 
packages, I end up with quite a few Erlang applications under deps/ (like 
riak_core, riak_control, eleveldb, etc.), and those just don't work out of the 
box like they seem to when we just grab the prepared Riak package. 

How should we set up a development environment so that Riak and all other 
Erlang applications works, and lets us build our different stuff (storage 
backend, vnodes, web app) on top of it? What's a good way of having good 
versatility when developing? Do we always have to go through rebuilding a brand 
new package after each modification?

Thanks a lot for you attention,

Mathieu
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Setting up a project to build atop Riak

2011-11-26 Thread Mathieu D'Amours
Hi all,

I'm sorry if my question sounds silly, but I'm not quite use to the OTP way
of developing an application. There seems to be something I'm not seeing in
all that.

We're going to develop an application that will use Riak Core, KV, Luwak
and Search: We might need to create a new storage backend for Riak KV, we
will design a particular distributed processing scheme using Riak Core and
will build an HTTP interface using webmachine.

For the moment, we need to build modules, test suites and all that, and
those are closely tied to the different Erlang components I just talk
about, like pretty much all the Riak components. So I need those external
components to work as well as if I installed them using the prepared Riak
package found at http://github.com/basho/riak . If I use rebar and the OTP
way of building packages, I end up with quite a few Erlang applications
under deps/ (like riak_core, riak_control, eleveldb, etc.), and those just
don't work out of the box like they seem to when we just grab the prepared
Riak package.

How should we set up a development environment so that Riak and all other
Erlang applications works, and lets us build our different stuff (storage
backend, vnodes, web app) on top of it? What's a good way of having good
versatility when developing? Do we always have to go through rebuilding a
brand new package after each modification?

Thanks a lot for you attention,

Mathieu
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com