Re: Blocking behavior of StorageProxy.fetchRows

Russell Bradberry Fri, 08 Jan 2016 17:04:21 -0800

While using LOCAL_QUORUM may be a solution in a lot of use cases, there are 
definitely use cases where reading at full QUORUM is required, think finance, 
medical, military. I think for these types of use cases using non blocking 
behavior will be an incredible improvement in performance. Even for 
LOCAL_QUORUM it would be a great improvement.
Simply stating to use LQ makes it seem like this use case is meritless, when it 
surely is not.
Plus anything that improves performance is, in my opinion, a good idea.
Whether or not it is worth the development investment is not something I can 
speak on.





On Fri, Jan 8, 2016 at 4:48 PM -0800, "Jonathan Haddad" <j...@jonhaddad.com> 
wrote:










Use local quorum, don't talk to remote dcs.
On Fri, Jan 8, 2016 at 1:41 PM Dominic Chevalier 
wrote:

> Hello,
>
> tldr;
>
> It looks like StorageProxy.fetchRows blocking for read responses can get
> pretty bad during quorum reads involving many geographically distant data
> centers. If this is true, why doesn't the coordinator handle replies
> asynchronously to keep over all throughput up?
>
> Long;
>
> I'm running apache cassandra 2.0.16 with ~400 nodes total, spread
> throughout 5 AWS regions globally.
>
> I tried running many (hundreds) simultaneous paged range scans over large
> token ranges, 'select * from table where token(partition_key) >= ? and
> token(partition_key) < ?' at consistency level QUORUM. Replication factor
> 3. Row size is small, a few hundred bytes max.
>
> This caused the cassandra nodes in the local data center hosting the
> application to become quite sluggish to other queries. Upon investigation
> of the code, it looks like, and comments say the same, that
> StorageProxy.fetchRows blocks for reads, even if the read comes from a
> remote node.
>
> Based on the behavior I observed, and the impact on other queries, I
> suspected the quorum reads were blocking the read stage executor pool of
> the coordinator nodes.
>
> If I've drawn the correct conclusions, why does the read stage block for
> reads from other nodes, especially nodes in remote datacenter where latency
> is not small, rather than asynchronously processing read replies and
> freeing up the read stage threads?
>
> I came across https://issues.apache.org/jira/browse/CASSANDRA-10989 which
> seems to target performance improvements in the threading model, which made
> me more curious about the above question.
>
> Thoughts and info are greatly appreciated.
>
> Kindly,
> Dominic
>

Re: Blocking behavior of StorageProxy.fetchRows

Reply via email to