Re: Cassandra eats all cpu cores, high load average

Julien Anguenot Fri, 12 Feb 2016 07:15:10 -0800

At the time when the load is high and you have to restart, do you see any 
pending compactions when using `nodetool compactionstats`?


Possible to see a `nodetool compactionstats` taken *when* the load is too high? 
 Have you checked the size of your SSTables for that big table? Any large ones 
in there?  What about the Java HEAP configuration on these nodes?

If you have too many tombstones I would try to decrease gc_grace_seconds so 
they get cleared out earlier during compactions.

   J.

> On Feb 12, 2016, at 8:45 AM, Skvazh Roman <r...@skvazh.com> wrote:
> 
> There is 1-4 compactions at that moment.
> We have many tombstones, which does not removed.
> DroppableTombstoneRatio is 5-6 (greater than 1)
> 
>> On 12 Feb 2016, at 15:53, Julien Anguenot <jul...@anguenot.org> wrote:
>> 
>> Hey, 
>> 
>> What about compactions count when that is happening?
>> 
>>  J.
>> 
>> 
>>> On Feb 12, 2016, at 3:06 AM, Skvazh Roman <r...@skvazh.com> wrote:
>>> 
>>> Hello!
>>> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with attached 
>>> 1.5 TB 4000 PIOPS EBS drive.
>>> Sometimes one or two nodes user cpu spikes to 100%, load average to 20-30 - 
>>> read requests drops of.
>>> Only restart of this cassandra services helps.
>>> Please advice.
>>> 
>>> One big table with wide rows. 600 Gb per node.
>>> LZ4Compressor
>>> LeveledCompaction
>>> 
>>> concurrent compactors: 4
>>> compactor throughput: tried from 16 to 128
>>> Concurrent_readers: from 16 to 32
>>> Concurrent_writers: 128
>>> 
>>> 
>>> https://gist.github.com/rskvazh/de916327779b98a437a6
>>> 
>>> 
>>> JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, load avg 
>>> 19.35
>>> http://code.google.com/p/jvmtop
>>> 
>>> Profiling PID 9256: org.apache.cassandra.service.CassandraDa
>>> 
>>> 95.73% (     4.31s) 
>>> ....google.common.collect.AbstractIterator.tryToComputeN()
>>> 1.39% (     0.06s) com.google.common.base.Objects.hashCode()
>>> 1.26% (     0.06s) io.netty.channel.epoll.Native.epollWait()
>>> 0.85% (     0.04s) net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
>>> 0.46% (     0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
>>> 0.26% (     0.01s) com.google.common.collect.Iterators$7.computeNext()
>>> 0.06% (     0.00s) io.netty.channel.epoll.Native.eventFdWrite()
>>> 
>>> 
>>> ttop:
>>> 
>>> 2016-02-12T08:20:25.605+0000 Process summary
>>> process cpu=1565.15%
>>> application cpu=1314.48% (user=1354.48% sys=-40.00%)
>>> other: cpu=250.67%
>>> heap allocation rate 146mb/s
>>> [000405] user=76.25% sys=-0.54% alloc=     0b/s - SharedPool-Worker-9
>>> [000457] user=75.54% sys=-1.26% alloc=     0b/s - SharedPool-Worker-14
>>> [000451] user=73.52% sys= 0.29% alloc=     0b/s - SharedPool-Worker-16
>>> [000311] user=76.45% sys=-2.99% alloc=     0b/s - SharedPool-Worker-4
>>> [000389] user=70.69% sys= 2.62% alloc=     0b/s - SharedPool-Worker-6
>>> [000388] user=86.95% sys=-14.28% alloc=     0b/s - SharedPool-Worker-5
>>> [000404] user=70.69% sys= 0.10% alloc=     0b/s - SharedPool-Worker-8
>>> [000390] user=72.61% sys=-1.82% alloc=     0b/s - SharedPool-Worker-7
>>> [000255] user=87.86% sys=-17.87% alloc=     0b/s - SharedPool-Worker-1
>>> [000444] user=72.21% sys=-2.30% alloc=     0b/s - SharedPool-Worker-12
>>> [000310] user=71.50% sys=-2.31% alloc=     0b/s - SharedPool-Worker-3
>>> [000445] user=69.68% sys=-0.83% alloc=     0b/s - SharedPool-Worker-13
>>> [000406] user=72.61% sys=-4.40% alloc=     0b/s - SharedPool-Worker-10
>>> [000446] user=69.78% sys=-1.65% alloc=     0b/s - SharedPool-Worker-11
>>> [000452] user=66.86% sys= 0.22% alloc=     0b/s - SharedPool-Worker-15
>>> [000256] user=69.08% sys=-2.42% alloc=     0b/s - SharedPool-Worker-2
>>> [004496] user=29.99% sys= 0.59% alloc=   30mb/s - CompactionExecutor:15
>>> [004906] user=29.49% sys= 0.74% alloc=   39mb/s - CompactionExecutor:16
>>> [010143] user=28.58% sys= 0.25% alloc=   26mb/s - CompactionExecutor:17
>>> [000785] user=27.87% sys= 0.70% alloc=   38mb/s - CompactionExecutor:12
>>> [012723] user= 9.09% sys= 2.46% alloc= 2977kb/s - RMI TCP 
>>> Connection(2673)-127.0.0.1
>>> [000555] user= 5.35% sys=-0.08% alloc=  474kb/s - SharedPool-Worker-24
>>> [000560] user= 3.94% sys= 0.07% alloc=  434kb/s - SharedPool-Worker-22
>>> [000557] user= 3.94% sys=-0.17% alloc=  339kb/s - SharedPool-Worker-25
>>> [000447] user= 2.73% sys= 0.60% alloc=  436kb/s - SharedPool-Worker-19
>>> [000563] user= 3.33% sys=-0.04% alloc=  460kb/s - SharedPool-Worker-20
>>> [000448] user= 2.73% sys= 0.27% alloc=  414kb/s - SharedPool-Worker-21
>>> [000554] user= 1.72% sys= 0.70% alloc=  232kb/s - SharedPool-Worker-26
>>> [000558] user= 1.41% sys= 0.39% alloc=  213kb/s - SharedPool-Worker-23
>>> [000450] user= 1.41% sys=-0.03% alloc=  158kb/s - SharedPool-Worker-17
>>

Re: Cassandra eats all cpu cores, high load average

Reply via email to