On Fri, Jun 14, 2013 at 7:19 AM, James Lee <james....@metaswitch.com> wrote: > I’m seeing generally good performance, but with periods where the Cassandra > node entirely stops > responding to write requests for several seconds at a time. I don’t have > much experience of Cassandra performance tuning, and would very much > appreciate some pointers on what I can do to improve matters.
It is relatively common for a Cassandra node to become unresponsive for a few seconds when doing various things. However as one usually has multiple replicas for any given key, this transient unavailability does not meaningfully impact overall availability. Pausing for more-than-a-few seconds is relatively uncommon and probably does indicate either sub optimal configuration or excessive workload. > -- I've used a RAID array for the data directory to improve write > performance. This significantly reduces the length of the slow period (from > ~10s to ~2s), but doesn't eliminate it. I've tried RAID10 and RAID0 using > varying number of drives, but there doesn't seem to be a significant > difference between the two. Do you see disk saturation when you're flushing? This statement suggests that you might be.. > -- I've used multiple drives for the data directory, symlinking the > directories for different keyspaces to different drives. That didn't > improve things significantly compared to using a single drive. I would not expect this to improve things if you are bounded on how quickly you can flush from a single thread. Stock questions : 1) What JVM? 2) What heap settings? 3) Do you also see GC logs around flush time? 4) Are you testing a single node only? =Rob