Re: Understanding read_repairs

Russell Brown Fri, 22 Feb 2013 00:26:00 -0800

Hi,
Thanks for trying Riak.

On 21 Feb 2013, at 23:48, Belai Beshah <belai.bes...@nwgeo.com> wrote:


> Hi All,
> 
> We are evaluating Riak to see if it can be used to cache large blobs of data. 
> Here is our test cluster setup:
> 
>       • six Ubuntu LTS 12.04 dedicated nodes with 8 core 2.6 Ghz CPU, 32 GB 
> RAM, 3.6T disk
>       • {pb_backlog, 64},
>       • {ring_creation_size, 256},
>       • {default_bucket_props, [{n_val, 2}, 
> {allow_mult,false},{last_write_wins,true}]},
>       • using bitcask as the backend 
> 
> Everything else default except the above. There is an HAProxy load balancer 
> infront of the nodes that the clients talk too configured according to the 
> basho wiki. Due to the nature of the application we are integrating we do 
> about 1200/s writes of approximately 40-50KB each and read them back almost 
> immediately. We noticed a lot of read repairs and since that was one of the 
> things that could indicate performance problem we go worried. So we wrote a 
> simple java client application that simulates our use case. The test program 
> is dead simple:
>       • generate keys using random UUID and value using Apache commons 
> RandomStringUtils
>       • create a thread pool of 5 and store key/value using “bucket.store()”
>       • read the values back using “bucket.fetch()” multiple times
> I could provide the spike code if needed. What we noticed is that we get a 
> lot of read repairs all over the place. We even made it use a single thread 
> to read/write, played with the write/read quorum and even put a delay of 5 
> minutes between the writes before the reads start to give the cluster time to 
> be eventually consistent. Nothing helps, we always see a lot of read repairs, 
> sometime as many as the number of inserts.


It sounds like you are experiencing this bug 
https://github.com/basho/riak_kv/pull/334

It is fixed in master, but it doesn't look like it made it into 1.3.0. If 
you're ok with building from source, I tried it and a patch from 
8895d2877576af2441bee755028df1a6cf2174c7 goes cleanly onto 1.3.0.

Cheers

Russell


> The good thing is that in all of these tests we have not seen any read 
> failures. Performance is also not bad, a few maxs here and there we don't 
> like but 90% looks good. Even when we killed a node, the reads are still 
> successful.
> 
> We are wondering what the expected ratio of read repairs is and what is a 
> reasonable time for the cluster not to restore to read_repair to fulfill a 
> read request or is there something we are missing in our setup.
> 
> Thanks
> Belai
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Understanding read_repairs

Reply via email to