Re: Understanding multi region read query and latency

Stéphane Alleaume Sun, 07 Aug 2022 14:14:38 -0700

You're right too, this option is not new, sorry.

Is this option can be useful ?



Le dim. 7 août 2022, 22:18, Bowen Song via user <user@cassandra.apache.org>
a écrit :

> Do you mean "nodetool settraceprobability"? This is not exactly new, I
> remember it was available on Cassandra 2.x.
> On 07/08/2022 20:43, Stéphane Alleaume wrote:
>
> I think perhaps you already know but i read you can now trace only a % of
> all queries, i will look to retrieve the name of this fonctionnality (in
> new Cassandra release).
>
> Hope it will help
> Kind regards
> Stéphane
>
>
> Le dim. 7 août 2022, 20:26, Raphael Mazelier <r...@futomaki.net> a écrit :
>
>> > "Read repair is in the blocking read path for the query, yep"
>>
>> OK interesting. This is not what I understood from the documentation. And
>> I use localOne level consistency.
>>
>> I enabled tracing (see in the attachment of my first msg)/ but I didn't
>> see read repair in the trace (and btw I tried to completely disable it on
>> my table setting both read_repair_chance and local_dc_read_repair_chance to
>> 0).
>>
>> The problem when enabling trace in cqlsh is that I only get slow result.
>> For having fast answer I need to iterate faster on my queries.
>>
>> I can provide again trace for analysis. I got something more readable in
>> python.
>>
>> Best,
>>
>> --
>>
>> Raphael
>>
>>
>> On 07/08/2022 19:30, C. Scott Andreas wrote:
>>
>> > but still as I understand the documentation the read repair should not
>> be in the blocking path of a query ?
>>
>> Read repair is in the blocking read path for the query, yep. At quorum
>> consistency levels, the read repair must complete before returning a result
>> to the client to ensure the data returned would be visible on subsequent
>> reads that address the remainder of the quorum.
>>
>> If you enable tracing - either for a single CQL statement that is
>> expected to be slow, or probabilistic from the server side to catch a slow
>> query in the act - that will help identify what’s happening.
>>
>> - Scott
>>
>> On Aug 7, 2022, at 10:25 AM, Raphael Mazelier <r...@futomaki.net>
>> <r...@futomaki.net> wrote:
>>
>> 
>>
>> Nope. And what really puzzle me is in the trace we really show the
>> difference between queries. The fast queries only request read from one
>> replicas, while slow queries request from multiple replicas (and not only
>> local to the dc).
>> On 07/08/2022 14:02, Stéphane Alleaume wrote:
>>
>> Hi
>>
>> Is there some GC which could affect coordinarir node ?
>>
>> Kind regards
>> Stéphane
>>
>> Le dim. 7 août 2022, 13:41, Raphael Mazelier <r...@futomaki.net> a
>> écrit :
>>
>>> Thanks for the answer but I was well aware of this. I use localOne as
>>> consistency level.
>>>
>>> My client connect to a local seeds, then choose a local coordinator (as
>>> far I can understand the trace log).
>>>
>>> Then for a batch of request I got approximately 98% of request treated
>>> in 2/3ms in local DC with one read request, and 2% treated by many nodes
>>> (according to the trace) and then way longer (250ms).
>>>
>>> ?
>>> On 06/08/2022 14:30, Bowen Song via user wrote:
>>>
>>> See the diagram below. Your problem almost certainly arises from step 4,
>>> in which an incorrect consistency level set by the client caused the
>>> coordinator node to send the READ command to nodes in other DCs.
>>>
>>> The load balancing policy only affects step 2 and 3, not step 1 or 4.
>>>
>>> You should change the consistency level to LOCAL_ONE/LOCAL_QUORUM/etc.
>>> to fix the problem.
>>>
>>> On 05/08/2022 22:54, Bowen Song wrote:
>>>
>>> The  DCAwareRoundRobinPolicy/TokenAwareHostPolicy controlls which
>>> Cassandra coordinator node the client sends queries to, not the nodes it
>>> connects to, nor the nodes that performs the actual read.
>>>
>>> A client sends a CQL read query to a coordinator node, and the
>>> coordinator node parses the CQL query, and send READ requests to other
>>> nodes in the cluster based on the consistency level.
>>>
>>> Have you checked the consistency level of the session (and the query if
>>> applicable)? Is it prefixed with "LOCAL_"? If not, the coordinator will
>>> send the READ requests to non-local DCs.
>>>
>>>
>>> On 05/08/2022 19:40, Raphael Mazelier wrote:
>>>
>>>
>>> Hi Cassandra Users,
>>>
>>> I'm relatively new to Cassandra and first I have to say I'm really
>>> impressed by the technology.
>>>
>>> Good design and a lot of stuff to understand the underlying (the Oreilly
>>> book help a lot as well as thelastpickle blog post).
>>>
>>> I have an muli-datacenter c* cluster (US, Europe, Singapore) with eight
>>> node on each (two seeds on each region), two racks on Eu, Singapore, 3 on
>>> US. Everything deployed in AWS.
>>>
>>> We have a keyspace configured with network topology and two replicas on
>>> every region like this: {'class': 'NetworkTopologyStrategy',
>>> 'ap-southeast-1': '2', 'eu-west-1': '2', 'us-east-1': '2'}
>>>
>>>
>>> Investigating some performance issue I noticed strange things in my
>>> experiment:
>>>
>>> What we expect is very slow latency 3/5ms max for this specific select
>>> query. So we want every read to be local the each datacenter.
>>>
>>> We configure DCAwareRoundRobinPolicy(local_dc=DC) in python, and the
>>> same in Go gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DC"))
>>>
>>> Testing a bit with two short program (I can provide them) in go and
>>> python I notice very strange result. Basically I do the same query over and
>>> over with a very limited dataset of id.
>>>
>>> The first result were surprising cause the very first query were always
>>> more than 250ms and after with stressing c* (playing with sleep between
>>> query) I can achieve a good ratio of query at 3/4 ms (what I expected).
>>>
>>> My guess was that long query were somewhat executed not locally (or at
>>> least imply multi datacenter queries) and short one no.
>>>
>>> Activating tracing in my program (like enalbing trace in cqlsh) kindla
>>> confirm my suspicion.
>>>
>>> (I will provide trace in attachment).
>>>
>>> My question is why sometime C* try to read not localy? how we can
>>> disable it? what is the criteria for this?
>>>
>>> (btw I'm very not fan of this multi region design for theses very
>>> specific kind of issues...)
>>>
>>> Also side question: why C* is so slow at connection? it's like it's
>>> trying to reach every nodes in each DC? (we only provide locals seeds
>>> however). Sometimes it take more than 20s...
>>>
>>> Any help appreciated.
>>>
>>> Best,
>>>
>>> --
>>>
>>> Raphael Mazelier
>>>
>>>

Re: Understanding multi region read query and latency

Reply via email to