riakc_pb_socket client & poolboy
Hi, I'm trying to use poolboy to organize a pool of connections to riak nodes with protobuf client. This has come from the suggestion here: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-September/009346.html The quote: "Using Poolboy is convenient because it comes as a dependency of riak_core. If you use Poolboy, you'll have to modify riakc_pb_socket slightly to account for the way poolboy initializes connections (add a start_link/1), or create a simple module to pass the initialization from poolboy to riakc_pb_socket." What I'm pondering about is what would be a convenient way to start riakc_pb_socket clients in a pool. For example, if I have 10 riak nodes in a cluster and want to open a single connection to each of them (using riakc_pb_socket), I need one poolboy worker per one riakc_pb_socket instance. In this case, I need a host/port of a single riak node for each riakc_pb_socket:start_link(). So I use WorkerArgs in poolboy:start_link/3 to pass a list of riak nodes. And here comes a confusion. poolboy passes WorkerArgs to each of its workers, so each worker gets the list of riak nodes. Now I need to somehow choose a node, maybe like proposed here: https://github.com/basho/riak-erlang-client/pull/45/files But I don't think it's a good solution. Maybe my idea of using riakc_pb_socket together with poolboy is not convenient? Maybe I should start a pool per each riak node? Maybe poolboy is not that convenient in this case as it seems? Would you share your experiences and thoughts on the subject? Thank you. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riakc_pb_socket client & poolboy
Hi Dmitry, ;) Yes, thank you. I'm also thinking about this approach. I just wanted to know how people use poolboy for that (It seems they do). But even considering your case, imagine that you have quite a big cluster and quite a big amount of rps. In this case you may want to have more than one load balancer to spread the load and be more tolerant to LB failures. Exactly the same situation arises. It's not a problem to implement a pooler in this case on my own. I'm just asking about using poolboy for that. On Mon, Sep 24, 2012 at 12:08 PM, Dmitry Demeshchuk wrote: > Why not use load balancer on top of Riak cluster, independently from > clients? If load balancing is sophisticated enough, it can even do a greater > job than just uniformly spreading requests between machines. > > Consider the following situation. You happen to send several requests for a > huge piece of data (each can be just a simple get, or even a complicated > map-reduce query), but the rest of the nodes, or just some of them, receive > much smaller queries. > > What an external load balancer can do is gathering some basic stats from > each machine (like load average, memory and IO consumption, number of rps to > Riak and so on) and balancing the requests according to this data. Another > benefit from this approach is that you can easily add any other Riak client > (Python, Ruby, whatever else) on top of it, and load will be still nicely > distributed between the machines. > > On Mon, Sep 24, 2012 at 11:38 AM, Yuri Lukyanov wrote: >> >> Hi, >> >> I'm trying to use poolboy to organize a pool of connections to riak >> nodes with protobuf client. >> >> This has come from the suggestion here: >> >> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-September/009346.html >> >> The quote: >> "Using Poolboy is convenient because it comes as a dependency of >> riak_core. >> >> If you use Poolboy, you'll have to modify riakc_pb_socket slightly to >> account for the way poolboy initializes connections (add a start_link/1), >> or create a simple module to pass the initialization from poolboy to >> riakc_pb_socket." >> >> What I'm pondering about is what would be a convenient way to start >> riakc_pb_socket clients in a pool. >> For example, if I have 10 riak nodes in a cluster and want to open a >> single connection to each of them (using riakc_pb_socket), I need one >> poolboy worker per one riakc_pb_socket instance. In this case, I need >> a host/port of a single riak node for each >> riakc_pb_socket:start_link(). So I use WorkerArgs in >> poolboy:start_link/3 to pass a list of riak nodes. And here comes a >> confusion. poolboy passes WorkerArgs to each of its workers, so each >> worker gets the list of riak nodes. Now I need to somehow choose a >> node, maybe like proposed here: >> https://github.com/basho/riak-erlang-client/pull/45/files >> But I don't think it's a good solution. >> >> Maybe my idea of using riakc_pb_socket together with poolboy is not >> convenient? Maybe I should start a pool per each riak node? Maybe >> poolboy is not that convenient in this case as it seems? >> >> Would you share your experiences and thoughts on the subject? >> >> Thank you. >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > -- > Best regards, > Dmitry Demeshchuk ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
"exception exit: disconnected" when reading old 0.13.0 riak data with protobuf client
Hi, I have a cluster with old 0.13.0 riak nodes installed. Erlang native API client is used to get/put data to/from the cluster. Now I'm trying to start using protobuf client (riakc_pb_socket). The problem is that the pb client can't read data saved with old native riak_client. Steps to reproduce: 1> {ok, C} = riak:client_connect('riak@localhost'). {ok,{riak_client,riak@localhost,<<4,81,114,138>>}} 2> C:put(riak_object:new(<<"test">>, <<"key">>, {1, 2})). ok 3> C:get(<<"test">>, <<"key">>). {ok,{r_object,<<"test">>,<<"key">>, [{r_content,{dict,2,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],...}, {{[],[],[],[],[],[],[],[],[],[],...}}}, {1,2}}], [{<<106,124,17,114,80,92,14,183>>,{1,63515448872}}, {<<7,141,174,80>>,{1,63515448872}}, {<<131,98,4,179,111,197>>,{1,63515729078}}, {<<5,81,122,124>>,{1,63515729078}}, {<<4,81,114,138>>,{1,63515729568}}, {<<131,98,3,57,103,94>>,{1,63515729568}}], {dict,1,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],...}, {{[],[],[],[],[],[],[],[],[],[],[],...}}}, undefined}} 4> {ok, Pid} = riakc_pb_socket:start_link("localhost", 8087). {ok,<0.13770.0>} 5> riakc_pb_socket:get(Pid, <<"test">>, <<"key">>). ** exception exit: disconnected On erlang node, the following error occurs: =ERROR REPORT 24-Sep-2012::22:12:48 === ** Generic server <0.4086.41> terminating ** Last message in was {tcp,#Port<0.96519>, [9|<<10,4,116,101,115,116,18,3,107,101,121>>]} ** When Server state == {state,#Port<0.96519>, {riak_client,riak@localhost ,<<3,139,240,244>>}, undefined,undefined} ** Reason for termination == ** {function_clause,[{riakclient_pb,encode,[1,{1,2}]}, {riakclient_pb,pack,5}, {riakclient_pb,iolist,2}, {riakclient_pb,encode,2}, {riakclient_pb,pack,5}, {riakclient_pb,pack,5}, {riakclient_pb,iolist,2}, {riakc_pb,encode,1}]} I suppose that it is somehow connected with the fact that the value is passed as an erlang term, not as clear binary data ({1, 2} vs term_to_binary({1, 2})). Is there something I can do about that? Btw, where can I find riakclient_pb.erl source code? ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Using Riak as cache layer
I suggest that you use http://www.couchbase.com/ (ex-membase) instead as a cache layer. It's faster but less reliable than riak, which is ok for cache layer. On Mon, Oct 1, 2012 at 12:47 AM, Pavel Kogan wrote: > Hi all experts, > > I want to use Riak for caching and have few questions: > > 1) How faster is using memory back-end over bitcask back-end (on SSD)? > 2) If throughput satisfying, is there any reason to use more than two nodes? > 3) When my memory reaches preset limit (lets say 4Gb) what is going to > happen > on inserting the next element? > a) Random element will be dropped. > b) Oldest element will be dropped. > c) Next element insert will fail. > > Thanks, >Pavel > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak 1.2: stalled handoff
Hi, I was adding the 7-th node to one of our riak 1.2 clusters. Everything was ok untill the process suddenly stopped with one handoff left: # riak-admin transfers Attempting to restart script through sudo -H -u riak riak@nsto0r6 waiting to handoff 1 partitions Active Transfers: Note that no transfers are displayed. At first I thought that it is just a temporary pause. But it's been already about 12 hours since then. Here is what member-status shows: # riak-admin member-status Attempting to restart script through sudo -H -u riak = Membership == Status RingPendingNode --- valid 14.1% 14.1%riak@nsto0r0 valid 14.1% 14.1%riak@nsto0r1 valid 14.1% 14.1%riak@nsto0r2 valid 14.1% 14.1%riak@nsto0r3 valid 15.6% 15.6%riak@nsto0r4 valid *15.6% 14.1%*riak@nsto0r5 valid *12.5% 14.1%*riak@nsto0r6 --- Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 What would be a reason for such behaviour and how can I investigate this further? How to force the handoff? Thanks. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak 1.2: stalled handoff
Backend is leveldb. Ring status: # riak-admin ring-status Attempting to restart script through sudo -H -u riak == Claimant === Claimant: riak@nsto0r1 Status: up Ring Ready: true == Ownership Handoff == Owner: riak@nsto0r5 Next Owner: riak@nsto0r6 Index: 0 Waiting on: [riak_kv_vnode] Complete: [riak_pipe_vnode] --- == Unreachable Nodes == All nodes are up and reachable # riak-admin ringready Attempting to restart script through sudo -H -u riak TRUE All nodes agree on the ring [riak@nsto0r0,riak@nsto0r1,riak@nsto0r2, riak@nsto0r3,riak@nsto0r4,riak@nsto0r5, riak@nsto0r6] Unfortunately this is the current status. Things have changed since I wrote the previous email . What I see now by running riak-admin transfers is: # riak-admin transfers Attempting to restart script through sudo -H -u riak riak@nsto0r6 waiting to handoff 1 partitions riak@nsto0r3 waiting to handoff 1 partitions riak@nsto0r2 waiting to handoff 1 partitions riak@nsto0r1 waiting to handoff 1 partitions Active Transfers: transfer type: ownership_handoff vnode type: riak_kv_vnode partition: 0 started: 2013-04-06 05:36:06 [1.93 min ago] last update: 2013-04-06 06:08:11 [2.01 s ago] objects transferred: 5184001 2692 Objs/s riak@nsto0r5 ===> riak@nsto0r6 931.18 KB/s So it looks like the partition finally got being transferred. I don't understand why there appear new handoffs to transfer though (maybe it's a different story about riak@nsto0r1; look like it crashed for some reason). Hm, is there a known reason why riak cluster may stop handoffs for a period of time? Thank you. On Sat, Apr 6, 2013 at 2:42 AM, Alexander Moore wrote: > Also, what's the output of riak-admin ringready ? > > Thanks, > Alex > > > --Alex > > > On Fri, Apr 5, 2013 at 6:37 PM, Alexander Moore wrote: > >> Hi Yuri, >> >> What backend are you using? >> >> What does riak-admin ring-status output? >> >> Thanks, >> Alex >> >> >> >> >> On Fri, Apr 5, 2013 at 6:30 PM, Yuri Lukyanov wrote: >> >>> Hi, >>> >>> I was adding the 7-th node to one of our riak 1.2 clusters. Everything >>> was ok untill the process suddenly stopped with one handoff left: >>> >>> # riak-admin transfers >>> Attempting to restart script through sudo -H -u riak >>> riak@nsto0r6 waiting to handoff 1 partitions >>> >>> Active Transfers: >>> >>> >>> Note that no transfers are displayed. At first I thought that it is just >>> a temporary pause. But it's been already about 12 hours since then. >>> >>> Here is what member-status shows: >>> >>> # riak-admin member-status >>> Attempting to restart script through sudo -H -u riak >>> = Membership >>> == >>> Status RingPendingNode >>> >>> --- >>> valid 14.1% 14.1%riak@nsto0r0 >>> valid 14.1% 14.1%riak@nsto0r1 >>> valid 14.1% 14.1%riak@nsto0r2 >>> valid 14.1% 14.1%riak@nsto0r3 >>> valid 15.6% 15.6%riak@nsto0r4 >>> valid *15.6% 14.1%*riak@nsto0r5 >>> valid *12.5% 14.1%*riak@nsto0r6 >>> >>> --- >>> Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 >>> >>> >>> What would be a reason for such behaviour and how can I investigate this >>> further? >>> How to force the handoff? >>> >>> Thanks. >>> >>> >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
A story about unexpected changes in riak-erlang-client dependencies
Hi, To begin with, I almost fucked up production servers with these changes in riak_pb: https://github.com/basho/riak_pb/commit/2505ff1fa3975b93150d7445b6f7b91940ecb970 Well, this issue boils down to commonly known rebar-related problem. I was even hesitating which mailing list I should've sent this massage to, riak-users@ or rebar@. The thing is that rebar does not allow you to fix dependencies on a certain version. When you do this in your rebar.config: {deps, [ {riakc, "1.3.1", {git, "https://github.com/basho/riak-erlang-client.git";, "c377347bb0"}}, ]}. you actually do _nothing_ to prevent you from unexpected changes to come. Any dependency may have its own dependencies and so does riakc. And guess what we see in riakc rebar.config in the commit "c377347bb0"? Right: {deps, [ {riak_pb, ".*", {git, "git://github.com/basho/riak_pb", "master"}} ]}. Welcome master. And here is the welcoming message on your production servers when you do hot code reloading: crasher: initial call: riakc_pb_socket:init/1 pid: <0.3364.6566> registered_name: [] exception exit: {{{badrecord,rpbgetreq},[{riak_kv_pb,iolist,2,[{file,"src/riak_kv_pb.erl"},{line,48}]},{riak_kv_pb,encode,2,[{file,"src/r iak_kv_pb.erl"},{line,40}]},{riak_pb_codec,encode,1,[{file,"src/riak_pb_codec.erl"},{line,77}]},{riakc_pb_socket,send_request,2,[{file,"src/r iakc_pb_socket.erl"},{line,1280}]},{riakc_pb_socket,handle_call,3,[{file,"src/riakc_pb_socket.erl"},{line,759}]},{gen_server,handle_msg,5,[{f ile,"gen_server.erl"},{line,588}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},[{gen_server,terminate,6,[{file,"gen_ser ver.erl"},{line,747}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} I see that now the situation is better and the last master version of the riakc rebar.config looks like this: {deps, [ {riak_pb, "1.4.0.2", {git, "git://github.com/basho/riak_pb", {tag, "1.4.0.2"}}} ]}. That helps. But only a bit. Let's look at the riak_pb rebar.config... https://github.com/basho/riak_pb/blob/master/rebar.config And voilĂ : {deps, [ {protobuffs, "0.8.*", {git, "git://github.com/basho/erlang_protobuffs.git", "master"}} ]}. New interesting things to come in the future. Just wait for it. And if we go through any erlang project on github using rebar we will see a lot of the same stuff. No one seems to worry about this much. The question is why? Is this me only who facing those issues? It doesn't look this. I see messages appearing again and again in rebar mailing list and in rebar issues list on github about the same matter. People invent hacks and patches to somehow improve the situation. For example, the following rebar plugin recursively locks all deps by creating a separate rebar.config: https://github.com/seth/rebar_lock_deps_plugin (and my fork of this: https://github.com/lukyanov/rebar-lock-deps) But all of those attempts are just partial solutions. The real solution would be to have a single centered dependencies repository, like ruby has, for instance. While we, as Erlang community, still do not have one, why don't we start _never_ using masters in our rebar.config's and make this a habit? I strongly urge Basho to be a good example of this, as you guys always do. Thanks. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A story about unexpected changes in riak-erlang-client dependencies
Glad to hear you are moving away from master dependencies. Would be great if all of us follow this way. It's sad that even if you completely sorted out your own deps and rebar.conig's, it does not solve the problem in general. You may still want to use third-party deps which are not necessary good in terms of dependency management policy. And you end up forking those projects for one reason only: to fix their rebar.config's and create tags. Not mentioning the headache when you want to update your forked versions with newer versions from original repositories... Looks like we are far away from the perfect dependency world :) On Tue, Apr 9, 2013 at 7:15 PM, Reid Draper wrote: > Yuri, > > You're certainly not the only one dealing with this. Sorry about that. It's > bit us here at Basho a few times too. As you saw, we're moving toward not > having master dependencies, at least for tagged versions of repos. Especially > for projects that are likely to be dependencies in other projects (ie. our > client libraries, lager, etc.). That being said, it will take us some time to > get there. You should also be aware of another 'fun' issue with rebar: if > multiple dependencies require the same dep, the version that rebar first > encounters will be the one that gets used. You'll find it helpful to use > `./rebar list-deps` to see exactly what's being used in your project. I agree > that this is something the Erlang community could improve, but in the > short-term we're going to be more diligent about using recursively using tags > for all tagged repos. > > Reid ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Simultaneous handoff and merge
Hi, I have a cluster of 17 riak (1.2.1) nodes with bitcask as a backend. Recetly one of the node was down for a while. After the node had been started the cluster started doing handoffs as expected. But then a merge process also began on the same node. I know this from the log messages like this: 2013-04-18 08:14:09.061 [info] <0.22952.79> Merged ["/var/lib/riak/bitcask/496682197061674038608283517368424307461195825152" And then something went wrong (the logs on the same node): 2013-04-18 08:39:22.217 [error] <0.31842.70> Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.4000.80> exit with reason {timeout,{gen_server,call,[riak_core_handoff_manager,{add_outbound,riak_kv_vnode,208378163135070142634509751539626289911881007104,riak@nsto2r5,<0.4000.80>}]}} in context child_terminated 2013-04-18 08:42:46.067 [error] <0.5154.80> gen_server <0.5154.80> terminated with reason: {timeout,{gen_server,call,[riak_core_handoff_manager,{add_inbound,[]}]}} 2013-04-18 08:42:52.790 [error] <0.5154.80> CRASH REPORT Process riak_core_handoff_listener with 1 neighbours exited with reason: {timeout,{gen_server,call,[riak_core_handoff_manager,{add_inbound,[]}]}} in gen_server:terminate/6 line 747 2013-04-18 08:42:53.450 [error] <0.31847.70> Supervisor riak_core_handoff_listener_sup had child riak_core_handoff_listener started with riak_core_handoff_listener:start_link() at <0.5154.80> exit with reason {timeout,{gen_server,call,[riak_core_handoff_manager,{add_inbound,[]}]}} in context child_terminated The node itself was disappearing from time to time: # riak-admin ring-status Node is not running! The beam process was still running though. Maybe it's not releated to handoffs & merge. It was just a guess. Any information and advice on this would be greatly appriciated. It's still happening right now and I could gather more details if someone wanted me to. Thanks in advance. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Simultaneous handoff and merge
More infortaion on this. We have a merge_window set to {6, 9}. Every day at this time we have our cluster heavely overloaded. ring-status often shows many nodes unreachable. What would you suggest? Would {merge_window, always} be better since all nodes would be merging at different times? I still have concerns about this. Even if one node is merging at the moment, it looks like the whole cluster is significantly affected. On Thu, Apr 18, 2013 at 1:07 PM, Yuri Lukyanov wrote: > Hi, > > I have a cluster of 17 riak (1.2.1) nodes with bitcask as a backend. > > Recetly one of the node was down for a while. After the node had been > started the cluster started doing handoffs as expected. But then a merge > process also began on the same node. I know this from the log messages like > this: > > 2013-04-18 08:14:09.061 [info] <0.22952.79> Merged > ["/var/lib/riak/bitcask/496682197061674038608283517368424307461195825152" > > > And then something went wrong (the logs on the same node): > > > 2013-04-18 08:39:22.217 [error] <0.31842.70> Supervisor > riak_core_vnode_sup had child undefined started with > {riak_core_vnode,start_link,undefined} at <0.4000.80> exit with reason > {timeout,{gen_server,call,[riak_core_handoff_manager,{add_outbound,riak_kv_vnode,208378163135070142634509751539626289911881007104,riak@nsto2r5,<0.4000.80>}]}} > in context child_terminated > > 2013-04-18 08:42:46.067 [error] <0.5154.80> gen_server <0.5154.80> > terminated with reason: > {timeout,{gen_server,call,[riak_core_handoff_manager,{add_inbound,[]}]}} > 2013-04-18 08:42:52.790 [error] <0.5154.80> CRASH REPORT Process > riak_core_handoff_listener with 1 neighbours exited with reason: > {timeout,{gen_server,call,[riak_core_handoff_manager,{add_inbound,[]}]}} in > gen_server:terminate/6 line 747 > 2013-04-18 08:42:53.450 [error] <0.31847.70> Supervisor > riak_core_handoff_listener_sup had child riak_core_handoff_listener started > with riak_core_handoff_listener:start_link() at <0.5154.80> exit with > reason > {timeout,{gen_server,call,[riak_core_handoff_manager,{add_inbound,[]}]}} in > context child_terminated > > > The node itself was disappearing from time to time: > > # riak-admin ring-status > Node is not running! > > The beam process was still running though. > > Maybe it's not releated to handoffs & merge. It was just a guess. > > > Any information and advice on this would be greatly appriciated. It's > still happening right now and I could gather more details if someone wanted > me to. > > Thanks in advance. > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com