Having a few requests time out while the service detects badness is
typical in this kind of system.  I don't think writing a completely
separate StorageProxy + supporting classes to allow avoiding this in
exchange for RF times the network bandwidth is a good idea.

On Wed, Jul 7, 2010 at 11:23 AM, Mason Hale <ma...@onespot.com> wrote:
> We've been experiencing some cluster-wide performance issues if any single
> node in the cluster is performing poorly. For example this occurs if
> compaction is running on any node in the cluster, or if a new node is being
> bootstrapped.
> We believe the root cause of this issue is a performance optimization in
> Cassandra that requests the "full" data from only a single node in the
> cluster, and MD5 checksums of the same data from all other nodes (depending
> on the consistency level of the read) for a given request. The net effect of
> this optimization is the read will block until the data is received from the
> node that is replying with the full data, even if all other nodes are
> responding much more quickly. Thus the entire cluster is only as fast as the
> slowest node for some fraction of all requests. The portion of requests sent
> to the slow node is a function of the total cluster size, replication factor
> and consistency level. For smallish clusters (e.g. 10  or fewer servers)
> this performance degradation can be quite pronounced.
> CASSANDRA-981 (https://issues.apache.org/jira/browse/CASSANDRA-981)
> discusses this issue and proposes the solution of dynamically identifying
> slow nodes and automatically treating them as if they were on a remote
> network, thus preventing certain performance critical operations (such as
> full data requests) from being performed on that node. This seems like a
> fine solution.
> However, a design that requires any read operation to wait on the reply from
> a specific single node seems counter to the fundamental design goal of
> avoiding any single points of failure. In this case, a single node with
> degraded performance (but still online) can dramatically reduce the overall
> performance of the cluster. The proposed solution would dynamically detect
> this condition and take evasive action when the condition is detected, but
> it would require some number of requests to perform poorly before a slow
> node is detected. It also smells like a complex solution that could have
> some unexpected side-effects and edge-cases.
> I wonder if a simpler solution would be more effective here? In the same way
> that hinted handoff can now be disabled via configuration, would it be
> feasible to optionally turn off this optimization? This way I can make the
> trade-off decision between the incremental performance improvement from this
> optimization or more reliable cluster-wide performance. Ideally, I would be
> able to configure how many nodes should reply with "full data" with each
> request. Thus I could increase this from 1 to 2 to avoid cluster-wide
> performance degradation if any single node is performing poorly. By being
> able to turn off or tune this setting I would also be able to do some a/b
> testing to evaluate what performance benefit is being gained by this
> optimization.
> I'm curious to know if anyone else has run into this issue, and if anyone
> else wishes they could turn off or tune this "full data"/md5 performance
> optimization?
> thanks,
> Mason
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Reply via email to