Playing with / understanding Riak configurations
Hi - I have a Riak node with n_val=3, r=2, w=2 and have just one key-object stored there-in. I'm trying to test various configurations to better understand the system and have the following observations - some dont seem to align with my understanding so far, so appreciate if someone can throw some light please... Thanks! 1. n=3, r=2, w=2: Base state, 1 key-value pair. 2. Change to n=2, r=2, w=2: When I query from my client, I randomly see 1 or 2 values being fetched. In fact, the number of keys fetched is 1 or 2, randomly changing each time the client queries the db. Ideally, I would have expected that if we reduce the n_val, there would be data loss from one of the vnodes. And that for this scenario, I would still expect only 1 (remaining) key-value pair to be read from the remaining two vnodes that has the data. Note that I dont intend to make such a change in production as cognizant of the recommendation to never decrease the value of n, but have done so only to test out the details. 3. Then change to n=2, r=1, w=1: I get the same alternating result as above, i.e. 1 or 2 values being fetched. 4. Then change to n=1, r=1, w=1: I get 3 key-value pairs, all identical, from the database. Again, are these all siblings? ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Playing with / understanding Riak configurations
Vikram, John Daily wrote a fantastic blog series that places your question in context and then answers it. http://basho.com/posts/technical/understanding-riaks-configurable-behaviors-part-1/ Tom On Wed, Jul 27, 2016 at 4:07 PM, Vikram Lalit wrote: > Hi - I have a Riak node with n_val=3, r=2, w=2 and have just one > key-object stored there-in. I'm trying to test various configurations to > better understand the system and have the following observations - some > dont seem to align with my understanding so far, so appreciate if someone > can throw some light please... Thanks! > > 1. n=3, r=2, w=2: Base state, 1 key-value pair. > > 2. Change to n=2, r=2, w=2: When I query from my client, I randomly see 1 > or 2 values being fetched. In fact, the number of keys fetched is 1 or 2, > randomly changing each time the client queries the db. Ideally, I would > have expected that if we reduce the n_val, there would be data loss from > one of the vnodes. And that for this scenario, I would still expect only 1 > (remaining) key-value pair to be read from the remaining two vnodes that > has the data. Note that I dont intend to make such a change in production > as cognizant of the recommendation to never decrease the value of n, but > have done so only to test out the details. > > 3. Then change to n=2, r=1, w=1: I get the same alternating result as > above, i.e. 1 or 2 values being fetched. > > 4. Then change to n=1, r=1, w=1: I get 3 key-value pairs, all identical, > from the database. Again, are these all siblings? > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Playing with / understanding Riak configurations
Thanks Tom... Yes I did read that but I couldn't deduce the outcome if n is decreased. John talks about data loss, but am actually observing a different result... perhaps am missing something! On Wed, Jul 27, 2016 at 6:11 PM, Tom Santero wrote: > Vikram, > > John Daily wrote a fantastic blog series that places your question in > context and then answers it. > > > http://basho.com/posts/technical/understanding-riaks-configurable-behaviors-part-1/ > > Tom > > On Wed, Jul 27, 2016 at 4:07 PM, Vikram Lalit > wrote: > >> Hi - I have a Riak node with n_val=3, r=2, w=2 and have just one >> key-object stored there-in. I'm trying to test various configurations to >> better understand the system and have the following observations - some >> dont seem to align with my understanding so far, so appreciate if someone >> can throw some light please... Thanks! >> >> 1. n=3, r=2, w=2: Base state, 1 key-value pair. >> >> 2. Change to n=2, r=2, w=2: When I query from my client, I randomly see 1 >> or 2 values being fetched. In fact, the number of keys fetched is 1 or 2, >> randomly changing each time the client queries the db. Ideally, I would >> have expected that if we reduce the n_val, there would be data loss from >> one of the vnodes. And that for this scenario, I would still expect only 1 >> (remaining) key-value pair to be read from the remaining two vnodes that >> has the data. Note that I dont intend to make such a change in production >> as cognizant of the recommendation to never decrease the value of n, but >> have done so only to test out the details. >> >> 3. Then change to n=2, r=1, w=1: I get the same alternating result as >> above, i.e. 1 or 2 values being fetched. >> >> 4. Then change to n=1, r=1, w=1: I get 3 key-value pairs, all identical, >> from the database. Again, are these all siblings? >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
riak TS max concurrent queries + overload error
Hello! We are experiencing error messages from the client that we don’t totally understand. They look like the following: Checking the riak error and crash logs, I’m seeing “overload” errors which I assume is causing the “no response from backend” client errors: {error, badarg, [{erlang,iolist_to_binary,[overload],[]}, {riak_kv_ts_svc,make_rpberrresp,2,[{file,"src/riak_kv_ts_svc.erl"},{line,483}]}, {riak_kv_ts_svc,sub_tsqueryreq,4,[{file,"src/riak_kv_ts_svc.erl"},{line,445}]}, {riak_kv_pb_ts,process,2,[{file,"src/riak_kv_pb_ts.erl"},{line,71}]}, {riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,388}]}, {riak_api_pb_server,connected,2,[{file,"src/riak_api_pb_server.erl"},{line,226}]}, {riak_api_pb_server,decode_buffer,2,[{file,...},...]},...]} I’m curious if these overload errors are caused by clients requesting more concurrent TS queries than our current setting for timeseries_max_concurrent_queries allows OR if the the timeseries_max_concurrent_queries is set too high and we are causing riak to crash. Do you have any recommendations on what timeseries_max_concurrent_queries should be set to relative to hardward specs? I assume it should be limited based on disk I/O bandwidth. Also, does anyone have any recommendations on query pooling so we can guarantee that multiple clients will not generate more queries than the cluster can handle? I like HAProxy for HTTP connection pooling but it doesn’t seem like it would work well for limiting the number of global queries from multiple PBC clients. Thank you! Chris ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Playing with / understanding Riak configurations
Vikram, I apologize, I initially just skimmed your question and thought you were asking something entirely different. While increasing your N value is safe, decreasing it on a bucket with pre-existing data, as you have, is not-recommended and the source of your inconsistent results. Tom On Wed, Jul 27, 2016 at 4:18 PM, Vikram Lalit wrote: > Thanks Tom... Yes I did read that but I couldn't deduce the outcome if n > is decreased. John talks about data loss, but am actually observing a > different result... perhaps am missing something! > > > On Wed, Jul 27, 2016 at 6:11 PM, Tom Santero wrote: > >> Vikram, >> >> John Daily wrote a fantastic blog series that places your question in >> context and then answers it. >> >> >> http://basho.com/posts/technical/understanding-riaks-configurable-behaviors-part-1/ >> >> Tom >> >> On Wed, Jul 27, 2016 at 4:07 PM, Vikram Lalit >> wrote: >> >>> Hi - I have a Riak node with n_val=3, r=2, w=2 and have just one >>> key-object stored there-in. I'm trying to test various configurations to >>> better understand the system and have the following observations - some >>> dont seem to align with my understanding so far, so appreciate if someone >>> can throw some light please... Thanks! >>> >>> 1. n=3, r=2, w=2: Base state, 1 key-value pair. >>> >>> 2. Change to n=2, r=2, w=2: When I query from my client, I randomly see >>> 1 or 2 values being fetched. In fact, the number of keys fetched is 1 or 2, >>> randomly changing each time the client queries the db. Ideally, I would >>> have expected that if we reduce the n_val, there would be data loss from >>> one of the vnodes. And that for this scenario, I would still expect only 1 >>> (remaining) key-value pair to be read from the remaining two vnodes that >>> has the data. Note that I dont intend to make such a change in production >>> as cognizant of the recommendation to never decrease the value of n, but >>> have done so only to test out the details. >>> >>> 3. Then change to n=2, r=1, w=1: I get the same alternating result as >>> above, i.e. 1 or 2 values being fetched. >>> >>> 4. Then change to n=1, r=1, w=1: I get 3 key-value pairs, all identical, >>> from the database. Again, are these all siblings? >>> >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak TS max concurrent queries + overload error
Hi Chris, This sounds like the issue described at https://github.com/basho/riak_kv/issues/1418 On Wed, Jul 27, 2016 at 11:19 PM, wrote: > Also, does anyone have any recommendations on query pooling so we can > guarantee that multiple clients will not generate more queries than the > cluster can handle? > Probably the right thing to do (when the RPC server is fixed) is to have the clients independently heck for backpressure from Riak (e.g. overload messages like this), retry with exponential backoff, and have each retry increment a counter somewhere in your monitoring system to make that problem visible. This should allow you to handle overload (somewhat) gracefully, respond to critical events (e.g. an alert), or to see any overload trends over time. Cian ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak TS max concurrent queries + overload error
Hi Cian, Thank you! I should've mentioned in my initial email that I thought we were experiencing the same bug you called out (in fact the 2nd comment on that github issue is actually from me). So, what I'm really curious about is whether or not the original "overload" error is happening because we're hitting the limit on TS max concurrent queries or if riak is actually "overloaded" and we shouldn't increase the configuration value for max concurrent queries. I'd like to know whether or not I should expect a certain value for max concurrent queries to be stable and performant for some given hardware specs. This is an experiment that we will probably run in house to determine a good value, but it would be great to know what range is expected to perform well. Also, I have no idea if the max concurrent queries setting includes subqueries over multiple quanta. For instance, if I have 4 TS queries hitting a riak node configured for 12 max queries and each query spans 3 - 4 quanta, should i expect an "overload" error? Thank you for the advice on implementing client backoff! Hopefully, we can do that as well as increase the overall TS query capacity of our cluster with a simple configuration change. I'm suspicious that we have a very conservative value at the moment. Chris From: Cian Synnott Sent: Wednesday, July 27, 2016 6:03 PM To: Johnson Chris CJOH Cc: riak-users@lists.basho.com Subject: Re: riak TS max concurrent queries + overload error Hi Chris, This sounds like the issue described at https://github.com/basho/riak_kv/issues/1418 On Wed, Jul 27, 2016 at 11:19 PM, wrote: > Also, does anyone have any recommendations on query pooling so we can > guarantee that multiple clients will not generate more queries than the > cluster can handle? > Probably the right thing to do (when the RPC server is fixed) is to have the clients independently heck for backpressure from Riak (e.g. overload messages like this), retry with exponential backoff, and have each retry increment a counter somewhere in your monitoring system to make that problem visible. This should allow you to handle overload (somewhat) gracefully, respond to critical events (e.g. an alert), or to see any overload trends over time. Cian ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com