On 1 Mar 2013, at 17:39, Belai Beshah <belai.bes...@nwgeo.com> wrote:
> Nothing fancy really the set method throws an exception > "com.basho.riak.client.RiakRetryFailedException: java.io.EOFException". Tried > to find anything that could explain it in the error or console logs but > nothing. Some questions: Are you using the PB client? Do you have anything in your riak logs that points at a pb socket crash? What version of the RJC are you using, please? Cheers Russell > > ________________________________________ > From: Kresten Krab Thorup [k...@trifork.com] > Sent: Friday, March 01, 2013 5:40 AM > To: Belai Beshah > Cc: Jared Morrow; riak-users@lists.basho.com; Russell Brown > Subject: Re: Understanding read_repairs > > Interesting. What does the failure look like? > > Kresten > > On Feb 27, 2013, at 11:25 PM, Belai Beshah > <belai.bes...@nwgeo.com<mailto:belai.bes...@nwgeo.com>> wrote: > > I see my post is not clear, the 0.1% is a get/set failure not slowdown. We > will have been ok with a slow response but a failed response from the AAE was > not something we can tolerate. Since the Java client by deafult does 3 > retiries I didn't see any point in adding more retries to see if it works > with more. > > ________________________________ > From: Jared Morrow [ja...@basho.com<mailto:ja...@basho.com>] > Sent: Wednesday, February 27, 2013 2:21 PM > To: Belai Beshah > Cc: Russell Brown; > riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > Subject: Re: Understanding read_repairs > > Belai, > > Active Anti-Entropy is doing work building trees and checking data, so it > will slow down gets/puts slightly. If you can't accept the slight > performance hit, disabling it is the right choice. In our testing, if you > use eLevelDB, 1.3.0 with AAE enabled is faster than 1.2.1 without AAE in most > cases due to the other speedups added to eLevelDB in 1.3.0. Since Bitcask > runs about the limit of what a filesystem can handle, AAE definitely shows a > slight performance hit since it is accessing the filesystem as well. > > Glad to hear the patch solved your other issues. > > -Jared > > > > On Wed, Feb 27, 2013 at 1:13 PM, Belai Beshah > <belai.bes...@nwgeo.com<mailto:belai.bes...@nwgeo.com>> wrote: > Patch worked good on 1.3, no more continuous read repairs. However, we > started seeing problems with Set/Get of about 0.1% which was not there in the > 1.2 release. Since this happens even without the patch on a clean 1.3 > install we narrowed it down to being Active Anti-Entropy since it looks like > it is always actively fixing data, may it is our write and read immediately > pattern or the fact that we have only a single 4TB disk behind each node and > they can't keep up. With Active Anti-Entropy turned off all our tests passed > and performance returned to 1.2 levels without any read repairs. For now we > are happy to continue our tests with Active Anti-Entropy turned off but it > will be great if we can get some pointer from the experts that could explain > the behavior we saw. Thanks you guys for the help. > > ________________________________ > From: Jared Morrow [ja...@basho.com<mailto:ja...@basho.com>] > Sent: Friday, February 22, 2013 11:56 AM > To: Belai Beshah > Cc: Russell Brown; > riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > Subject: Re: Understanding read_repairs > > > Belai, > > One other option is to use our "basho-patches" functionality. We use it to > run new code on current installations where sending a new .beam file is > easier than remaking the packages or compiling from source. On your ubuntu > system using our packages, the folder should be in > /usr/lib/riak/lib/basho-patches. > > To do this you just need the one file changed in the PR pointed to by Russell. > > Here are the steps to make that happen: > > * Install Erlang R15B01: > http://docs.basho.com/riak/latest/tutorials/installation/Installing-Erlang/ > * Get riak_kv: git clone > git://github.com/basho/riak_kv.git<http://github.com/basho/riak_kv.git> > * compile riak_kv with just 'make' > * copy the resulting .beam file in the ebin folder to the machines you need > the new file:scp ebin/riak_kv_vnode.beam > user@myriaknode:/usr/lib/riak/lib/basho-patches > * stop each node and restart them one at a time > * If you want to convince yourself you are using the new code, you can do a > 'riak attach' to attach to the node and run code:which('riak_kv_vnode'). > (Don't forget the '.' at the end) > > For example on my dev install here is the command before the file is in > basho-patches: > > (dev2@127.0.0.1<mailto:dev2@127.0.0.1>)1> code:which('riak_kv_vnode'). > ".../lib/riak_kv-1.3.0/ebin/riak_kv_vnode.beam" > > Here is the command after I put the .beam in the basho-patches directory: > > (dev2@127.0.0.1<mailto:dev2@127.0.0.1>)1> code:which('riak_kv_vnode'). > ".../lib/basho-patches/riak_kv_vnode.beam" > > Notice the path of the code changed from .../riak_kv-1.3.0/... to > .../basho-patches/... > > That might seem like a lot of work, but it is a really handy and useful > trick/skill that you might use quite a bit down the road. > > Hope that helps, > Jared > > > On Fri, Feb 22, 2013 at 10:25 AM, Belai Beshah > <belai.bes...@nwgeo.com<mailto:belai.bes...@nwgeo.com>> wrote: > Thanks Russel, that looks like exactly the problem we saw. I have never built > riak from source before but I will give it a try it this weekend. > > ________________________________________ > From: Russell Brown [russell.br...@me.com<mailto:russell.br...@me.com>] > Sent: Friday, February 22, 2013 1:24 AM > To: Belai Beshah > Cc: riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > Subject: Re: Understanding read_repairs > > Hi, > Thanks for trying Riak. > > On 21 Feb 2013, at 23:48, Belai Beshah > <belai.bes...@nwgeo.com<mailto:belai.bes...@nwgeo.com>> wrote: > >> Hi All, >> >> We are evaluating Riak to see if it can be used to cache large blobs of >> data. Here is our test cluster setup: >> >> • six Ubuntu LTS 12.04 dedicated nodes with 8 core 2.6 Ghz CPU, 32 GB >> RAM, 3.6T disk >> • {pb_backlog, 64}, >> • {ring_creation_size, 256}, >> • {default_bucket_props, [{n_val, 2}, >> {allow_mult,false},{last_write_wins,true}]}, >> • using bitcask as the backend >> >> Everything else default except the above. There is an HAProxy load balancer >> infront of the nodes that the clients talk too configured according to the >> basho wiki. Due to the nature of the application we are integrating we do >> about 1200/s writes of approximately 40-50KB each and read them back almost >> immediately. We noticed a lot of read repairs and since that was one of the >> things that could indicate performance problem we go worried. So we wrote a >> simple java client application that simulates our use case. The test program >> is dead simple: >> • generate keys using random UUID and value using Apache commons >> RandomStringUtils >> • create a thread pool of 5 and store key/value using “bucket.store()” >> • read the values back using “bucket.fetch()” multiple times >> I could provide the spike code if needed. What we noticed is that we get a >> lot of read repairs all over the place. We even made it use a single thread >> to read/write, played with the write/read quorum and even put a delay of 5 >> minutes between the writes before the reads start to give the cluster time >> to be eventually consistent. Nothing helps, we always see a lot of read >> repairs, sometime as many as the number of inserts. > > > It sounds like you are experiencing this bug > https://github.com/basho/riak_kv/pull/334 > > It is fixed in master, but it doesn't look like it made it into 1.3.0. If > you're ok with building from source, I tried it and a patch from > 8895d2877576af2441bee755028df1a6cf2174c7 goes cleanly onto 1.3.0. > > Cheers > > Russell > > >> The good thing is that in all of these tests we have not seen any read >> failures. Performance is also not bad, a few maxs here and there we don't >> like but 90% looks good. Even when we killed a node, the reads are still >> successful. >> >> We are wondering what the expected ratio of read repairs is and what is a >> reasonable time for the cluster not to restore to read_repair to fulfill a >> read request or is there something we are missing in our setup. >> >> Thanks >> Belai >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com