Hi Francisco I've seen the same error in a dev environment running on a single Riak node with an n_val of 1, so in my case it was nothing to do with a failing node. I wasn't running Riak Search either. I posted a question about it to this list a week or so ago but haven't seen a reply yet.
So indeed, does anyone know what's causing this error and how we can avoid it? Regards, Martin. On 28 Sep 2011, at 20:39, francisco treacy <francisco.tre...@gmail.com> wrote: > Regarding (3) I found a Forcing Read Repair contrib function > (http://contrib.basho.com/bucket_inspector.html) which should help. > > Otherwise for the m/r error, all of my buckets use default n_val and write > quorum. Could it be that some data never reached that particular node in the > cluster? That is, should've I used W=3? During the failure, many assets were > returning 404s which triggered read-repair (and were ok upon subsequent > request), but no luck with the Map/Reduce function (it kept on failing). > Could it have something to do with Riak Search? > > Thanks, > > Francisco > > > 2011/9/26 francisco treacy <francisco.tre...@gmail.com> > Hi all, > > I have a 3-node Riak cluster, and I am simulating the scenario of physical > nodes crashing. > > When 2 nodes go down, and I query the remaining one, it fails with: > > {error, > {exit, > {{{error, > {no_candidate_nodes,exhausted_prefist, > [{riak_kv_mapred_planner,claim_keys,3}, > {riak_kv_map_phase,schedule_input,5}, > {riak_kv_map_phase,handle_input,3}, > {luke_phase,executing,3}, > {gen_fsm,handle_msg,7}, > {proc_lib,init_p_do_apply,3}], > []}}, > {gen_fsm,sync_send_event, > [<0.31566.2330>, > {inputs, > > (...) > > Here I'm doing a M/R, inputs being fed by Search. > > (1) All of the involved buckets have N=3, and all involved requests R=1 (I > don't really need quorum for this usecase) > > Why is it failing? I'm sure i'm missing something basic here > > (2) Probably worth noting, those 3 nodes are spread across *two* physical > servers (1 on small one, 2 on beefier one). I've heard it is "not a good > idea", not sure why though. These two servers are definitely enough still for > our current load; should I consider adding a third one? > > (3) To overcome the aforementioned error, I added a new node to the cluster > (installed on the small server). Now the setup looks like: 4 nodes = 2 on > small server, 2 on beefier one. > > When 2 nodes go down, this works. Which brings me to another topic... could > you point me to good strategies to "pre-" invoke read-repair? Is it up to > clients to scan the keyspace forcing reads? It's a disaster usability-wise > when first users start getting 404s all over the place. > > Francisco > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com