Hi all Running Cassandra 1.0.7, I recently changed a few read heavy column families from SizeTieredCompactionStrategy to LeveledCompactionStrategy and added in SnappyCompressor, all with defaults so 5MB files and if memory serves me correctly 64k chunk size for compression. The results were amazingly good, my data size halved and my heap usage and performance stabilised nicely, until it came time to run a repair.
When a repair isn't running I'm seeing a saw toothed pattern on my heap graphs with CMS clearing out about 1.5GB each GC run. The CMS GC appears as a sudden vertical drop on the Old Gen usage graph. In addition to what I consider a healthy looking heap usage, my par new and CMS collections are running far quicker than before I made the changes. However, when I run a repair my CMS usage graph no longer shows sudden drops but rather gradual slopes and only manages to clear around 300MB each GC. This seems to occur on 2 other nodes in my cluster around the same time, I assume this is because they're the replicas (we use 3 replicas). Parnew collections look about the same on my graphs with or without repair running so no trouble there so far as I can tell. The symptom of the memory pressure during repair is either the node running the repair of one of the two replicas tends to perform badly with read stage backing up into the thousands at times. If I run a repair on more than one or two nodes at the same time (it's a 7 node cluster) the memory pressure is so bad that half the cluster ends up OOMing, and this happened during off-peak when it's doing about half the reads we handle during peak so not particularly loaded. The question I'm asking is has anyone run into this behaviour before, and if so how was it dealt with? Once I have nursed the cluster thru the repair it's currently running I will be turning off compression on one of my larger CFs to see if it makes a difference, I'll send the results of that test tomorrow.