LCS does create a lot of SSTables unfortunately. The nodes are keeping up on compactions though.
This started after starting to read from a CF that has tombstones in its rows. What's even more concerning, is it's continuing even after stopping reads and dropping that CF. On Wed, Jan 22, 2014 at 3:02 AM, Jason Wee <peich...@gmail.com> wrote: > SSTable count: 365 > > Your sstable counts are too many... don't know what is the best count should > be but for my experience, anything below 20 are good. Is your compaction > running? > > I read on a few blog on how should we read cfhistograms, but never really > understood fully. Anyone care to explain using OP attached cfhistogram ? > > Taking a wild shot, perhaps trying different build, oracle jdk 1.6u25 > perhaps? > > HTH > > Jason > > > > > On Tue, Jan 21, 2014 at 4:02 PM, John Watson <j...@disqus.com> wrote: >> >> Pretty reliable, at some point, nodes will have super long GCs. >> Followed by https://issues.apache.org/jira/browse/CASSANDRA-6592 >> >> Lovely log messages: >> >> 9030.798: [ParNew (0: promotion failure size = 4194306) (2: >> promotion failure size = 4194306) (4: promotion failure size = >> 4194306) (promotion failed) >> Total time for which application threads were stopped: 23.2659990 >> seconds >> >> Full gc.log until just before restarting the node (see another 32s GC >> near the end): https://gist.github.com/dctrwatson/f04896c215fa2418b1d9 >> >> Here's graph of GC time, where we can see a an increase 30 minutes >> prior (indicator that the issue will happen soon): >> http://dl.dropboxusercontent.com/s/q4dr7dle023w9ih/render.png >> >> Graph of various Heap usage: >> http://dl.dropboxusercontent.com/s/e8kd8go25ihbmkl/download.png >> >> Running compactions in the same time frame: >> http://dl.dropboxusercontent.com/s/li9tggk4r2l3u4b/render%20(1).png >> >> CPU, IO, ops and latencies: >> >> https://dl.dropboxusercontent.com/s/yh9osm9urplikb7/2014-01-20%20at%2011.46%20PM%202x.png >> >> cfhistograms/cfstats: >> https://gist.github.com/dctrwatson/9a08b38d0258ae434b15 >> >> Cassandra 1.2.13 >> Oracle JDK 1.6u45 >> >> JVM opts: >> >> MAX_HEAP_SIZE="8G" >> HEAP_NEW_SIZE="1536M" >> >> Tried HEAP_NEW_SIZE of 768M, 800M, 1000M and 1600M >> Tried default "-XX:SurvivorRatio=8" and "-XX:SurvivorRatio=4" >> Tried default "-XX:MaxTenuringThreshold=1" and >> "-XX:MaxTenuringThreshold=2" >> >> All still eventually ran into long GC. >> >> Hardware for all 3 nodes: >> >> (2) E5520 @ 2.27Ghz (8 cores w/ HT) ["16" cores] >> (6) 4GB RAM [24G RAM] >> (1) 500GB 7.2k for commitlog >> (2) 400G SSD for data (configured as separate data directories) > >