This morning one node went down (3-node 0.14 cluster) and I started getting the dreaded `no_candidate_nodes,exhausted_prefist` error posted earlier.
If 2 nodes are remaining, and I always use N=3 R=1 ... why is it failing? Something to do with my use of Search? Thanks Francisco 2011/9/28 Martin Woods <mw2...@gmail.com> > Hi Francisco > > I've seen the same error in a dev environment running on a single Riak > node with an n_val of 1, so in my case it was nothing to do with a failing > node. I wasn't running Riak Search either. I posted a question about it to > this list a week or so ago but haven't seen a reply yet. > > So indeed, does anyone know what's causing this error and how we can avoid > it? > > Regards, > Martin. > > > > On 28 Sep 2011, at 20:39, francisco treacy < <francisco.tre...@gmail.com> > francisco.tre...@gmail.com> wrote: > > Regarding (3) I found a Forcing Read Repair contrib function > (<http://contrib.basho.com/bucket_inspector.html><http://contrib.basho.com/bucket_inspector.html> > http://contrib.basho.com/bucket_inspector.html) which should help. > > Otherwise for the m/r error, all of my buckets use default n_val and write > quorum. Could it be that some data never reached that particular node in > the cluster? That is, should've I used W=3? During the failure, many > assets were returning 404s which triggered read-repair (and were ok upon > subsequent request), but no luck with the Map/Reduce function (it kept on > failing). Could it have something to do with Riak Search? > > Thanks, > > Francisco > > > 2011/9/26 francisco treacy < > <francisco.tre...@gmail.com><francisco.tre...@gmail.com> > francisco.tre...@gmail.com> > >> Hi all, >> >> I have a 3-node Riak cluster, and I am simulating the scenario of >> physical nodes crashing. >> >> When 2 nodes go down, and I query the remaining one, it fails with: >> >> {error, >> {exit, >> {{{error, >> {no_candidate_nodes,exhausted_prefist, >> [{riak_kv_mapred_planner,claim_keys,3}, >> {riak_kv_map_phase,schedule_input,5}, >> {riak_kv_map_phase,handle_input,3}, >> {luke_phase,executing,3}, >> {gen_fsm,handle_msg,7}, >> {proc_lib,init_p_do_apply,3}], >> []}}, >> {gen_fsm,sync_send_event, >> [<0.31566.2330>, >> {inputs, >> >> (...) >> >> Here I'm doing a M/R, inputs being fed by Search. >> >> (1) All of the involved buckets have N=3, and all involved requests R=1 >> (I don't really need quorum for this usecase) >> >> Why is it failing? I'm sure i'm missing something basic here >> >> (2) Probably worth noting, those 3 nodes are spread across *two* physical >> servers (1 on small one, 2 on beefier one). I've heard it is "not a good >> idea", not sure why though. These two servers are definitely enough still >> for our current load; should I consider adding a third one? >> >> (3) To overcome the aforementioned error, I added a new node to the >> cluster (installed on the small server). Now the setup looks like: 4 nodes >> = 2 on small server, 2 on beefier one. >> >> When 2 nodes go down, this works. Which brings me to another topic... >> could you point me to good strategies to "pre-" invoke read-repair? Is it >> up to clients to scan the keyspace forcing reads? It's a disaster >> usability-wise when first users start getting 404s all over the place. >> >> Francisco >> > > _______________________________________________ > riak-users mailing list > <riak-users@lists.basho.com>riak-users@lists.basho.com > <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com