That makes sense, but this shouldn't make requests last for the timeout duration -- at quorum, it should be responding to the client as soon as it gets that second-fastest reply. If I'm understanding right that this was making the response to the client block until the overwhelmed node timed out, that's a bug. What version of Cassandra is this?
On Fri, Dec 3, 2010 at 7:27 PM, Daniel Doubleday <daniel.double...@gmx.net> wrote: > Yes. > > I thought that would make sense, no? I guessed that the quorum read forces > the slowest of the 3 nodes to keep the pace of the faster ones. But it cant. > No matter how small the performance diff is. So it will just fill up. > > Also when saying 'practically dead' and 'never recovers' I meant for the > time I kept the reads up. As soon as I stopped the scan it recovered. It > just was not able to recover during the load because for that it would have > to become faster that the other nodes and with full queues that just > wouldn't happen. > > By changing the node for every read I would hit the slower node every couple > of reads. This forced the client to wait for the slower node. > > I guess to change that behavior you would need to use something like dynamic > snitch and ask only as many peer nodes as necessary to satisfy quorum and > only ask other nodes when reads fail. But that would probably increase > latency and cause whatever other problems. Since you probably don't want to > run the cluster at a load at which the weakest node of a replication group > can't keep up I don't think this is an issue at all. > > Just wanted to prevent others shooting their own foot as I did. > > On 03.12.10 23:36, Jonathan Ellis wrote: >> >> Am I understanding correctly that you had all connections going to one >> cassandra node, which caused one of the *other* nodes to die, and >> spreading the connections around the cluster fixed it? >> >> On Fri, Dec 3, 2010 at 4:00 AM, Daniel Doubleday >> <daniel.double...@gmx.net> wrote: >>> >>> Hi all >>> >>> I have found an anti pattern the other day which I wanted to share, >>> although its pretty special case. >>> >>> Special case because our production cluster is somewhat strange: 3 >>> servers, rf = 3. We do consistent reads/writes with quorum. >>> >>> I did a long running read series (loads of reads as fast as I can) with >>> one connection. Since all queries could be handled by that node the overall >>> latency is determined by its own and the fastest second node (cause the >>> quorum is satisfied with 2 reads). What will happen than is that after a >>> couple of minutes one of the other two nodes will go in 100% io wait and >>> will drop most of its read messages. Leaving it practically dead while the >>> other 2 nodes keep responding at an average of ~10ms. The node that died was >>> only a little slower ~13ms average but it will inevitably queue up messages. >>> Average response time increases to timeout (10 secs) flat. It never >>> recovers. >>> >>> It happened all the time. And it wasn't the same node that would die. >>> >>> The solution was that I return the connection to the pool and get a new >>> one for every read to balance the load on the client side. >>> >>> Obviously this will not happen in a cluster where the percentage of all >>> rows on one node is enough. But the same thing will probably happen if you >>> scan by continuos tokens (meaning that you will read from the same node a >>> long time). >>> >>> Cheers, >>> >>> Daniel Doubleday >>> smeet.com, Berlin >> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com