Re: Easier to use Java Client

Kresten Krab Thorup Tue, 29 Mar 2011 02:03:52 -0700

One thing, which is often missed by newcomers to Riak [I'm not saying you 
missed it], is the importance of managing client IDs, and passing the right 
vector clocks back to the server.

 { Basho'ers ... please corret me if I'm wrong }

Kresten



So, Rule#1 (which has two clauses), which you can always revert to:

1.a / every client needs a clientID, which is distinct for that client.  Be 
sure to always pass it along in all calls (in Java that is done by calling 
setClientID on the RiakClient, at the HTTP-level, it is done by passing the 
X-Riak-ClientId HTTP header).

1.b / when you send an update (HTTP PUT or DELETE), always pass along the 
X-Riak-Vectorclock from a corresponding GET.  If you don't do this, your PUT is 
likely to go to /dev/null, because Riak thinks that it is a replay of an old 
request.

Until you're re really familiar with how Riak works, you should always do these 
two, or you will be severely burned when you realize that it doesn't behave as 
expected.  Believe me, I've been there.


1.a / Choosing a good client ID
========================

If you don't choose a client ID, Riak will do it for you ... BUT .. it will 
choose a new one for EVERY REQUEST.  This has many issues, so Riak should 
really require YOU to come up with one in stead; perhaps it will do so at some 
point in the future.

Riak has some special optimizations if your client ID is the Base64-encoding of 
a byte array of length 4.  So, a good, default way to choose a client id is 
thus:

        static SecureRandom rnd = new SecureRandom();
        
        static ThreadLocal<String> CLIENT_ID = new ThreadLocal<String>() {
                protected String initialValue() {
                        return randomClientID();
                };
        };
        
        public static String getClientID() {
                return CLIENT_ID.get();
        }
        
        static private String randomClientID() {
                byte[] bytes = new byte[4];
                rnd.nextBytes(bytes);
                return Base64.encode(bytes);;
        }

This makes it so that each thread in your application is assigned a new random 
ClientID, which is often useful if your client is multi-threaded.

The above code is *alot* better than the default of having the server side 
choose a new client id for every request.

If you have some kind of logical unique, non-concurrent client concept in your 
system, that may be even better.  It could e.g. be the IMEI of your mobile 
phone, if your Riak client app is running on a Phone; or it could be a userid, 
if you are sure that only one user is accessing the system at a time.


1.b / Passing the VectorClock
=======================

Secondly, you need to make sure that you pass the vector clock. 

You should think of the vector clock as an opaque "optimistic concurrency 
token", that you receive when you do a GET, and have to pass in when you do a 
PUT ... and then you get a new "optimistic concurrency token", that you have to 
use henceforth.

Depending on the configuration of your buckets, using an old vector clock will 
simply cause the PUT request to be ignored (if allow_mult=false), or cause 
siblings to be created (if allow_mult=true).  This is where Riak is often "not 
what you expect", but there is a good reason for this behavior.

IT IS ABSOLUTELY PARAMOUNT TO UNDERSTAND THIS.



The above two things (1.a and 1.b) are so difficult to understand for 
newcomers, and a bit tricky to get right, so IMHO a new Java client should 
provide some way to avoid doing these mistakes as the default behavior.

- So, it should choose a good client ID fo you if you don't.
- And it should make it so that you can't do UPDATE/PUT without having first 
GOT'en the riak object.  

The last part is especially tricky.  Perhaps we should have the API look like 
this to help that ....

  interface RiakObject {
     ...
  }

  interface UpdateableRiakObject extends RiakObject { ... }
  interface CreateableRiakObject extends RiakObject { ... }

  RiakClient {
      UpdateableRiakObject update(UpdateableRiakObject o) throws NotModified
      { ... send PUT ... }

      UpdateableRiakObject create(CreateableRiakObject o) throws AlreadyThere
      { ... send PUT ... }

      UpdateableRiakObject get(bucket, key);

      CreateableRiakObject fresh(bucket, key);
  }

I.e. NOT EXPOSE constructors for the implementors of RiakObject.  The only way 
to get an UpdateableRiakObject is to call RiakClient.get, or as the result of 
calling update/create; you can't just allocate one.  Also calling update/create 
should "invalidate" the original object so that it cannot accidentally be used 
again.  

I really think we need to have a way to enforce the linear nature of these 
things.  Otherwise people get fooled.



Kresten

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Easier to use Java Client

Reply via email to