Re: riak TS max concurrent queries + overload error
On Thu, Jul 28, 2016 at 6:10 AM, wrote: > Thank you! I should've mentioned in my initial email that I thought we were > experiencing the same bug you called out (in fact the 2nd comment on that > github issue is actually from me). > Aha, cool. :o) > So, what I'm really curious about is whether or not the original "overload" > error is happening because we're hitting the limit on TS max concurrent > queries or if riak is actually "overloaded" and we shouldn't increase the > configuration value for max concurrent queries. > I looked into this when examining the bug, and it *is* stimulated by hitting the max concurrent queries, which as you've noted is set nervous-alpha-software low by default. Plain `overload` is a little unhelpful in that it is used deeper within Riak KV too, but in this case I'm confident you're hitting the one in Riak TS's query path. > I'd like to know whether or not I should expect a certain value for max > concurrent queries to be stable and performant for some given hardware specs. > This is an experiment that we will probably run in house to determine a good > value, but it would be great to know what range is expected to perform well. > I don't think there is a range expected to perform well, yet. The PBC server just dying on overload suggests it hasn't really been loadtested much at Basho, so sharing whatever you come up with on the list would be good. :o) > Also, I have no idea if the max concurrent queries setting includes > subqueries over multiple quanta. For instance, if I have 4 TS queries hitting > a riak node configured for 12 max queries and each query spans 3 - 4 quanta, > should i expect an "overload" error? > No, max concurrent queries does not include this. Digging around in the code, the max subqueries configuration is used in the query compiler, and the error message in that case is `too_many_subqueries` https://github.com/basho/riak_kv/blob/2.1.3-ts/src/riak_kv_qry_compiler.erl#L533 which I'm not sure is plumbed properly back through the PBC server's error responses, and the code is a little more twisty than I have time to check right now. If I understand the code correctly, overload due to max concurrent queries is hit when there are more than 3 queries waiting to be served by the query FSMs, which are started around here: https://github.com/basho/riak_kv/blob/2.1.3-ts/src/riak_kv_qry_sup.erl#L61 So, `timeseries_max_concurrent_queries` gives us the number of query FSMs per node. There's a short, static overflow queue of queries, and if the FSMs can't keep up, you get the overload message. I don't know why the default number of query FSMs running per node is so low. Perhaps early customers were using it purely interactively, at a command prompt? In any case, try setting it lots higher and see how you get on. Cian ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Playing with / understanding Riak configurations
Thanks Tom... On Wed, Jul 27, 2016 at 6:27 PM, Tom Santero wrote: > Vikram, > > I apologize, I initially just skimmed your question and thought you were > asking something entirely different. > > While increasing your N value is safe, decreasing it on a bucket with > pre-existing data, as you have, is not-recommended and the source of your > inconsistent results. > > Tom > > On Wed, Jul 27, 2016 at 4:18 PM, Vikram Lalit > wrote: > >> Thanks Tom... Yes I did read that but I couldn't deduce the outcome if n >> is decreased. John talks about data loss, but am actually observing a >> different result... perhaps am missing something! >> >> >> On Wed, Jul 27, 2016 at 6:11 PM, Tom Santero wrote: >> >>> Vikram, >>> >>> John Daily wrote a fantastic blog series that places your question in >>> context and then answers it. >>> >>> >>> http://basho.com/posts/technical/understanding-riaks-configurable-behaviors-part-1/ >>> >>> Tom >>> >>> On Wed, Jul 27, 2016 at 4:07 PM, Vikram Lalit >>> wrote: >>> Hi - I have a Riak node with n_val=3, r=2, w=2 and have just one key-object stored there-in. I'm trying to test various configurations to better understand the system and have the following observations - some dont seem to align with my understanding so far, so appreciate if someone can throw some light please... Thanks! 1. n=3, r=2, w=2: Base state, 1 key-value pair. 2. Change to n=2, r=2, w=2: When I query from my client, I randomly see 1 or 2 values being fetched. In fact, the number of keys fetched is 1 or 2, randomly changing each time the client queries the db. Ideally, I would have expected that if we reduce the n_val, there would be data loss from one of the vnodes. And that for this scenario, I would still expect only 1 (remaining) key-value pair to be read from the remaining two vnodes that has the data. Note that I dont intend to make such a change in production as cognizant of the recommendation to never decrease the value of n, but have done so only to test out the details. 3. Then change to n=2, r=1, w=1: I get the same alternating result as above, i.e. 1 or 2 values being fetched. 4. Then change to n=1, r=1, w=1: I get 3 key-value pairs, all identical, from the database. Again, are these all siblings? ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com