Hi Rob, Thanks for your response!
Unthrottle compaction, that's an insane number of SSTables. I set compaction_throughput_mb_per_sec to 0 and restarted Cassandra. Somehow I don't see GC anymore but compaction is still very slow and IO is still not a bottleneck: iotop reports ~400 K/s for disk read, disk write is 0 with occasional spikes to ~500 K/s. Only one thread is doing IO, which is expected since compaction is single-threaded by default. CPU load is ~115% (where 400% is a max, since machine has 4 cores). top reports the following numbers: Cpu(s): 3.3%us, 1.0%sy, 25.4%ni, 69.8%id, 0.0%wa, 0.0%hi, 0.4%si, 0.1%st nodetool gcstats reports the following: Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections 31531 24 354 3 7379325584 23 Before Interval (ms) and Total GC Elapsed (ms) were almost the same, now looks like GC happens a lot too, but finishes really quickly. It looks to me that compaction thread is busy doing something that produces quite a lot of garbage for GC to collect. Compaction performance is bounded by CPU, which is a surprise for me, I would expect disk IO to be a bottleneck. I don't think increasing the heap size if going to help in this case, what do you think? Should I try JMX call userDefinedCompaction? Are you using LCS or STS compaction? > We are using STS compaction. On Tue, Dec 30, 2014 at 4:30 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Tue, Dec 30, 2014 at 3:12 PM, Mikhail Strebkov <streb...@gmail.com> > wrote: > >> We have a table in our production Cassandra that is stored on 11369 >> SSTables. The average SSTable count for the other tables is around 15, and >> the read latency for them is much smaller. >> > > Unthrottle compaction, that's an insane number of SSTables. > > >> I tried to run manual compaction (nodetool compact my_keyspace my_table) >> but then the node starts spending ~90% of the time in GC and compaction >> advances super slowly (it would take a couple of weeks to finish). I >> checked IO stats with "iotop" and there is almost no IO going on. >> > > Are you using LCS or STS compaction? > > >> We're running Cassandra on EC2 (m1.xlarge) which has 15G of memory, using >> DataStax Community AMI. Our Cassandra version is 2.1.2. We didn't change >> Cassandra configuration from the default in the AMI, so Cassandra >> calculated 3760M for the heap size. >> > > One solution would be to temporarily increase heap, though going above 8gb > or so will increase duration of GCs approaching seconds. > > Another alternative is to use the JMX call userDefinedCompaction to do a > compaction that is less major. > > Why does Cassandra fall into this "90% CPU time in GC" state and how can I >> tune Cassandra so that it can finish the compaction successfully? >> > > Because 12,000 sstables use a lot of heap, and you only have ~4gb of heap. > You go into "GC pre-fail" because you can't reclaim enough heap. > > =Rob > http://twitter.com/rcolidba >