Re: Long GC due to promotion failures

John Watson Wed, 22 Jan 2014 11:37:03 -0800

LCS does create a lot of SSTables unfortunately. The nodes are keeping
up on compactions though.


This started after starting to read from a CF that has tombstones in its rows.

What's even more concerning, is it's continuing even after stopping
reads and dropping that CF.

On Wed, Jan 22, 2014 at 3:02 AM, Jason Wee <peich...@gmail.com> wrote:
> SSTable count: 365
>
> Your sstable counts are too many... don't know what is the best count should
> be but for my experience, anything below 20 are good. Is your compaction
> running?
>
> I read on a few blog on how should we read cfhistograms, but never really
> understood fully. Anyone care to explain using OP attached cfhistogram ?
>
> Taking a wild shot, perhaps trying different build, oracle jdk 1.6u25
> perhaps?
>
> HTH
>
> Jason
>
>
>
>
> On Tue, Jan 21, 2014 at 4:02 PM, John Watson <j...@disqus.com> wrote:
>>
>> Pretty reliable, at some point, nodes will have super long GCs.
>> Followed by https://issues.apache.org/jira/browse/CASSANDRA-6592
>>
>> Lovely log messages:
>>
>>   9030.798: [ParNew (0: promotion failure size = 4194306)  (2:
>> promotion failure size = 4194306)  (4: promotion failure size =
>> 4194306)  (promotion failed)
>>   Total time for which application threads were stopped: 23.2659990
>> seconds
>>
>> Full gc.log until just before restarting the node (see another 32s GC
>> near the end): https://gist.github.com/dctrwatson/f04896c215fa2418b1d9
>>
>> Here's graph of GC time, where we can see a an increase 30 minutes
>> prior (indicator that the issue will happen soon):
>> http://dl.dropboxusercontent.com/s/q4dr7dle023w9ih/render.png
>>
>> Graph of various Heap usage:
>> http://dl.dropboxusercontent.com/s/e8kd8go25ihbmkl/download.png
>>
>> Running compactions in the same time frame:
>> http://dl.dropboxusercontent.com/s/li9tggk4r2l3u4b/render%20(1).png
>>
>> CPU, IO, ops and latencies:
>>
>> https://dl.dropboxusercontent.com/s/yh9osm9urplikb7/2014-01-20%20at%2011.46%20PM%202x.png
>>
>> cfhistograms/cfstats:
>> https://gist.github.com/dctrwatson/9a08b38d0258ae434b15
>>
>> Cassandra 1.2.13
>> Oracle JDK 1.6u45
>>
>> JVM opts:
>>
>> MAX_HEAP_SIZE="8G"
>> HEAP_NEW_SIZE="1536M"
>>
>> Tried HEAP_NEW_SIZE of 768M, 800M, 1000M and 1600M
>> Tried default "-XX:SurvivorRatio=8" and "-XX:SurvivorRatio=4"
>> Tried default "-XX:MaxTenuringThreshold=1" and
>> "-XX:MaxTenuringThreshold=2"
>>
>> All still eventually ran into long GC.
>>
>> Hardware for all 3 nodes:
>>
>> (2) E5520 @ 2.27Ghz (8 cores w/ HT) ["16" cores]
>> (6) 4GB RAM [24G RAM]
>> (1) 500GB 7.2k for commitlog
>> (2) 400G SSD for data (configured as separate data directories)
>
>

Re: Long GC due to promotion failures

Reply via email to