Our cassandra client fails over if a node times out. Aside from actual failure, repair and major compactions can make a node so slow that it affects application performance.
One problem we've run in to is that a node in the midst of repair will still have requests routed to it internally, even if all clients have failed over. With a small number of nodes, this has a major impact on the performance of the overall system. I'm wondering whether people have any recommendations on tuning this behaviour. It would be really nice not to route requests to an insanely slow node.