Re: Blocking behavior of StorageProxy.fetchRows

Jonathan Haddad Fri, 08 Jan 2016 16:48:32 -0800

Use local quorum, don't talk to remote dcs.
On Fri, Jan 8, 2016 at 1:41 PM Dominic Chevalier <[email protected]>
wrote:


> Hello,
>
> tldr;
>
> It looks like StorageProxy.fetchRows blocking for read responses can get
> pretty bad during quorum reads involving many geographically distant data
> centers. If this is true, why doesn't the coordinator handle replies
> asynchronously to keep over all throughput up?
>
> Long;
>
> I'm running apache cassandra 2.0.16 with ~400 nodes total, spread
> throughout 5 AWS regions globally.
>
> I tried running many (hundreds) simultaneous paged range scans over large
> token ranges, 'select * from table where token(partition_key) >= ? and
> token(partition_key) < ?' at consistency level QUORUM. Replication factor
> 3. Row size is small, a few hundred bytes max.
>
> This caused the cassandra nodes in the local data center hosting the
> application to become quite sluggish to other queries. Upon investigation
> of the code, it looks like, and comments say the same, that
> StorageProxy.fetchRows blocks for reads, even if the read comes from a
> remote node.
>
> Based on the behavior I observed, and the impact on other queries, I
> suspected the quorum reads were blocking the read stage executor pool of
> the coordinator nodes.
>
> If I've drawn the correct conclusions, why does the read stage block for
> reads from other nodes, especially nodes in remote datacenter where latency
> is not small, rather than asynchronously processing read replies and
> freeing up the read stage threads?
>
> I came across https://issues.apache.org/jira/browse/CASSANDRA-10989 which
> seems to target performance improvements in the threading model, which made
> me more curious about the above question.
>
> Thoughts and info are greatly appreciated.
>
> Kindly,
> Dominic
>

Re: Blocking behavior of StorageProxy.fetchRows

Reply via email to