Re: riak TS max concurrent queries + overload error

2016-07-28 Thread Cian Synnott
On Thu, Jul 28, 2016 at 6:10 AM,   wrote:
> Thank you! I should've mentioned in my initial email that I thought we were 
> experiencing the same bug you called out (in fact the 2nd comment on that 
> github issue is actually from me).
>
Aha, cool. :o)

> So, what I'm really curious about is whether or not the original "overload" 
> error is happening because we're hitting the limit on TS max concurrent 
> queries or if riak is actually "overloaded" and we shouldn't increase the 
> configuration value for max concurrent queries.
>
I looked into this when examining the bug, and it *is* stimulated by
hitting the max concurrent queries, which as you've noted is set
nervous-alpha-software low by default. Plain `overload` is a little
unhelpful in that it is used deeper within Riak KV too, but in this
case I'm confident you're hitting the one in Riak TS's query path.

> I'd like to know whether or not I should expect a certain value for max 
> concurrent queries to be stable and performant for some given hardware specs. 
> This is an experiment that we will probably run in house to determine a good 
> value, but it would be great to know what range is expected to perform well.
>
I don't think there is a range expected to perform well, yet. The PBC
server just dying on overload suggests it hasn't really been
loadtested much at Basho, so sharing whatever you come up with on the
list would be good. :o)

> Also, I have no idea if the max concurrent queries setting includes 
> subqueries over multiple quanta. For instance, if I have 4 TS queries hitting 
> a riak node configured for 12 max queries and each query spans 3 - 4 quanta, 
> should i expect an "overload" error?
>
No, max concurrent queries does not include this.

Digging around in the code, the max subqueries configuration is used
in the query compiler, and the error message in that case is
`too_many_subqueries`
  
https://github.com/basho/riak_kv/blob/2.1.3-ts/src/riak_kv_qry_compiler.erl#L533

which I'm not sure is plumbed properly back through the PBC server's
error responses, and the code is a little more twisty than I have time
to check right now.

If I understand the code correctly, overload due to max concurrent
queries is hit when there are more than 3 queries waiting to be served
by the query FSMs, which are started around here:
  https://github.com/basho/riak_kv/blob/2.1.3-ts/src/riak_kv_qry_sup.erl#L61

So, `timeseries_max_concurrent_queries` gives us the number of query
FSMs per node. There's a short, static overflow queue of queries, and
if the FSMs can't keep up, you get the overload message.

I don't know why the default number of query FSMs running per node is
so low. Perhaps early customers were using it purely interactively, at
a command prompt? In any case, try setting it lots higher and see how
you get on.

Cian

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Playing with / understanding Riak configurations

2016-07-28 Thread Vikram Lalit
Thanks Tom...

On Wed, Jul 27, 2016 at 6:27 PM, Tom Santero  wrote:

> Vikram,
>
> I apologize, I initially just skimmed your question and thought you were
> asking something entirely different.
>
> While increasing your N value is safe, decreasing it on a bucket with
> pre-existing data, as you have, is not-recommended and the source of your
> inconsistent results.
>
> Tom
>
> On Wed, Jul 27, 2016 at 4:18 PM, Vikram Lalit 
> wrote:
>
>> Thanks Tom... Yes I did read that but I couldn't deduce the outcome if n
>> is decreased. John talks about data loss, but am actually observing a
>> different result... perhaps am missing something!
>>
>>
>> On Wed, Jul 27, 2016 at 6:11 PM, Tom Santero  wrote:
>>
>>> Vikram,
>>>
>>> John Daily wrote a fantastic blog series that places your question in
>>> context and then answers it.
>>>
>>>
>>> http://basho.com/posts/technical/understanding-riaks-configurable-behaviors-part-1/
>>>
>>> Tom
>>>
>>> On Wed, Jul 27, 2016 at 4:07 PM, Vikram Lalit 
>>> wrote:
>>>
 Hi - I have a Riak node with n_val=3, r=2, w=2 and have just one
 key-object stored there-in. I'm trying to test various configurations to
 better understand the system and have the following observations - some
 dont seem to align with my understanding so far, so appreciate if someone
 can throw some light please... Thanks!

 1. n=3, r=2, w=2: Base state, 1 key-value pair.

 2. Change to n=2, r=2, w=2: When I query from my client, I randomly see
 1 or 2 values being fetched. In fact, the number of keys fetched is 1 or 2,
 randomly changing each time the client queries the db. Ideally, I would
 have expected that if we reduce the n_val, there would be data loss from
 one of the vnodes. And that for this scenario, I would still expect only 1
 (remaining) key-value pair to be read from the remaining two vnodes that
 has the data. Note that I dont intend to make such a change in production
 as cognizant of the recommendation to never decrease the value of n, but
 have done so only to test out the details.

 3. Then change to n=2, r=1, w=1: I get the same alternating result as
 above, i.e. 1 or 2 values being fetched.

 4. Then change to n=1, r=1, w=1: I get 3 key-value pairs, all
 identical, from the database. Again, are these all siblings?

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


>>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com