Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

Jay Patel Fri, 19 Sep 2014 19:45:57 -0700

Thanks Tyler for clarification. I'll opened a tix CASSANDRA-7982
<https://issues.apache.org/jira/browse/CASSANDRA-7982>. For now, I've
assigned to myself and put you as a reviewer. Pls. change assignment as you
prefer..


Assume that we now batch the requests & send only one request to the
replica:

What's the extra overhead incurred by vnode to process the secondary index
request on the replica? In other words, does replica still has to fire
individual queries internally for all the token ranges
[(max(-9193352069377957523),
max(-9136021049555745100), etc.], or it can be optimized to be done in one
shot? If multiple queries, then how much overhead it adds? (in terms of
latency because of multiple disk lookups, etc.?)

Would you mind to point me C* code location (class/method) to explore more?

Also, can you help understand what it means by min() and max() in the trace
output?
[min(-9223372036854775808), max(-9193352069377957523)] vs.
(max(-8959555493872108621),
max(-8929774302283364912)]

Jay



On Fri, Sep 19, 2014 at 3:28 PM, Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Fri, Sep 19, 2014 at 4:53 PM, Jay Patel <pateljay3...@gmail.com> wrote:
>
>>
>> When coordinator fires indexed scan request to node 192.168.51.22, why
>> don't it ask that node to check all of its (at least primary) ranges for
>> the queried data, at once. Also, internally that node should be able to
>> just do one scan through all of the ranges held by it, isn't it?
>> (e.g. [min(-9223372036854775808), max(-9193352069377957523), and
>> (max(-9136021049555745100), max(-8959555493872108621)], etc. ]
>>
>> Seems like it needs to query data in token order. So,
>> min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22.
>> But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)])
>> is on 192.168.51.25 so fire query there. Then, next range  (max(-
>> *9136021049555745100*), max(-8959555493872108621)] again on
>> 192.168.51.22. Btw,, I'm not too sure regarding min/max or max/max in trace
>> output.
>>
>
> The coordinator certainly could batch multiple range requests that are
> going to the same replica.  It's an optimization that would primarily help
> the empty table/high cardinality case, but you're welcome to open a
> ticket.  3.0 is the earliest this would make it in.
>
>
>>
>> I found below comment in
>> https://issues.apache.org/jira/browse/CASSANDRA-4858.
>> "The problem is that we have to scan the nodes in token order so we dont
>> break the existing API's, if we do so then we are sending a lot more
>> requests and waiting for the response than the number of nodes. "
>> Don't understand the restriction though - "don't break the existing
>> API's".
>>
>
> I think he's just saying that we have to make sure we return results in
> token order (and if there's a limit on the query, return the first N
> results when listed in token order).
>
>
>>
>> With non-vnode, it only queries a particular node only one time..Btw, in
>> the worst case, I understand secondary index query has to scan all the
>> nodes in cluster sometime (empty table or high cardinality index?) but I
>> don't understand why vnode makes it to scan the *same node *multiple
>> times. If RF is 1, then also I see this behavior.
>>
>> >> Snippet from output1.txt attached earlier:
>> Executing indexed scan for [min(-9223372036854775808),
>> max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
>> Executing indexed scan for (max(-9193352069377957523),
>> max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
>> Executing indexed scan for (max(-9136021049555745100),
>> max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
>> Executing indexed scan for (max(-8959555493872108621),
>> max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |
>>
>
> I'm not sure how your question here is different from the one above.
>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

Reply via email to