Thanks for the link! Luckily the cluster I'm running is not yet in production and running with dummy data so I will throw that jar on the nodes and I'll let you know how things shake out.
On Sun, Mar 13, 2016 at 11:02 PM, Paulo Motta <pauloricard...@gmail.com> wrote: > You could be hitting CASSANDRA-11344 ( > https://issues.apache.org/jira/browse/CASSANDRA-11344). If that's the > case, you may try to replace your cassandra jar on an affected node with a > version with this fix in place and force bloom filter regeneration to see > if if it fixes your problem. You can build with "ant jar" from this branch: > https://github.com/pauloricardomg/cassandra/tree/3.4-11344 > > You can force bloom filter regeneration by either removing your *Filter.db > files (make sure to backup them before for safety) or changing the > bloom_filter_fp_chance before restarting affected nodes with the fixed jar. > > 2016-03-13 19:51 GMT-03:00 Adam Plumb <apl...@fiksu.com>: > >> So it's looking like the bloom filter off heap memory usage is ramping up >> and up until the OOM killer kills the java process. I relaunched on >> instances with 60GB of memory and the same thing is happening. A node will >> start using more and more RAM until the process is killed, then another >> node will start using more and more until it is also killed. >> >> Is this the expected behavior? It doesn't seem ideal to me. Is there >> anything obvious that I'm doing wrong? >> >> On Fri, Mar 11, 2016 at 11:31 AM, Adam Plumb <apl...@fiksu.com> wrote: >> >>> Here is the creation syntax for the entire schema. The xyz table has >>> about 2.1 billion keys and the def table has about 230 million keys. Max >>> row size is about 3KB, mean row size is 700B. >>> >>> CREATE KEYSPACE abc WITH replication = {'class': >>>> 'NetworkTopologyStrategy', 'us-east': 3}; >>>> CREATE TABLE xyz ( >>>> id text, >>>> secondary_id int, >>>> data text, >>>> PRIMARY KEY(id) >>>> ) >>>> WITH >>>> compaction = { 'class': 'LeveledCompactionStrategy' } >>>> and compression = {'class': 'LZ4Compressor'}; >>>> CREATE INDEX secondary_id_index ON abc.xyz (secondary_id); >>>> CREATE TABLE def ( >>>> id text, >>>> secondary_id int, >>>> data text, >>>> PRIMARY KEY(id) >>>> ) >>>> WITH >>>> compaction = { 'class': 'LeveledCompactionStrategy' } >>>> and compression = {'class': 'LZ4Compressor'}; >>>> CREATE INDEX secondary_id_index_def ON abc.def (secondary_id); >>> >>> >>> On Fri, Mar 11, 2016 at 11:24 AM, Jack Krupansky < >>> jack.krupan...@gmail.com> wrote: >>> >>>> What is your schema and data like - in particular, how wide are your >>>> partitions (number of rows and typical row size)? >>>> >>>> Maybe you just need (a lot) more heap for rows during the repair >>>> process. >>>> >>>> -- Jack Krupansky >>>> >>>> On Fri, Mar 11, 2016 at 11:19 AM, Adam Plumb <apl...@fiksu.com> wrote: >>>> >>>>> These are brand new boxes only running Cassandra. Yeah the kernel is >>>>> what is killing the JVM, and this does appear to be a memory leak in >>>>> Cassandra. And Cassandra is the only thing running, aside from the basic >>>>> services needed for Amazon Linux to run. >>>>> >>>>> On Fri, Mar 11, 2016 at 11:17 AM, Sebastian Estevez < >>>>> sebastian.este...@datastax.com> wrote: >>>>> >>>>>> Sacrifice child in dmesg is your OS killing the process with the most >>>>>> ram. That means you're actually running out of memory at the Linux level >>>>>> outside of the JVM. >>>>>> >>>>>> Are you running anything other than Cassandra on this box? >>>>>> >>>>>> If so, does it have a memory leak? >>>>>> >>>>>> all the best, >>>>>> >>>>>> Sebastián >>>>>> On Mar 11, 2016 11:14 AM, "Adam Plumb" <apl...@fiksu.com> wrote: >>>>>> >>>>>>> I've got a new cluster of 18 nodes running Cassandra 3.4 that I just >>>>>>> launched and loaded data into yesterday (roughly 2TB of total storage) >>>>>>> and >>>>>>> am seeing runaway memory usage. These nodes are EC2 c3.4xlarges with >>>>>>> 30GB >>>>>>> RAM and the heap size is set to 8G with a new heap size of 1.6G. >>>>>>> >>>>>>> Last night I finished loading up the data, then ran an incremental >>>>>>> repair on one of the nodes just to ensure that everything was working >>>>>>> (nodetool repair). Over night all 18 nodes ran out of memory and were >>>>>>> killed by the OOM killer. I restarted them this morning and they all >>>>>>> came >>>>>>> up fine, but just started churning through memory and got killed again. >>>>>>> I >>>>>>> restarted them again and they're doing the same thing. I'm not getting >>>>>>> any >>>>>>> errors in the system log, since the process is getting killed abruptly >>>>>>> (which makes me think this is a native memory issue, not heap) >>>>>>> >>>>>>> Obviously this behavior isn't the best. I'm willing to provide any >>>>>>> data people need to help debug this, these nodes are still up and >>>>>>> running. >>>>>>> I'm also in IRC if anyone wants to jump on there. >>>>>>> >>>>>>> Here is the output of ps aux: >>>>>>> >>>>>>> 497 64351 108 89.5 187156072 27642988 ? SLl 15:13 62:15 >>>>>>>> java -ea -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities >>>>>>>> -XX:ThreadPriorityPolicy=42 -Xms7536M -Xmx7536M -Xmn1600M >>>>>>>> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 >>>>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >>>>>>>> -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 >>>>>>>> -XX:CMSInitiatingOccupancyFraction=75 >>>>>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>>>>> -XX:+UseTLAB -XX:MaxGCPauseMillis=200 >>>>>>>> -XX:InitiatingHeapOccupancyPercent=45 >>>>>>>> -XX:-ParallelRefProcEnabled -XX:-AlwaysPreTouch -XX:+UseBiasedLocking >>>>>>>> -XX:+UseTLAB -XX:+ResizeTLAB -Djava.net.preferIPv4Stack=true >>>>>>>> -Dcom.sun.management.jmxremote.port=7199 >>>>>>>> -Dcom.sun.management.jmxremote.rmi.port=7199 >>>>>>>> -Dcom.sun.management.jmxremote.ssl=false >>>>>>>> -Dcom.sun.management.jmxremote.authenticate=false >>>>>>>> -XX:+CMSClassUnloadingEnabled -Dlogback.configurationFile=logback.xml >>>>>>>> -D >>>>>>>> *cas*sandra.logdir=/usr/local/*cas*sandra/logs -D*cas* >>>>>>>> sandra.storagedir=/usr/local/*cas*sandra/data -D*cas* >>>>>>>> sandra-pidfile=/var/run/*cas*sandra/*cas*sandra.pid -cp /usr/local/ >>>>>>>> *cas*sandra/conf:/usr/local/*cas* >>>>>>>> sandra/build/classes/main:/usr/local/*cas* >>>>>>>> sandra/build/classes/thrift:/usr/local/*cas* >>>>>>>> sandra/lib/airline-0.6.jar:/usr/local/*cas* >>>>>>>> sandra/lib/antlr-runtime-3.5.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/apache-*cas*sandra-3.4.jar:/usr/local/*cas* >>>>>>>> sandra/lib/apache-*cas*sandra-clientutil-3.4.jar:/usr/local/*cas* >>>>>>>> sandra/lib/apache-*cas*sandra-thrift-3.4.jar:/usr/local/*cas* >>>>>>>> sandra/lib/asm-5.0.4.jar:/usr/local/*cas*sandra/lib/*cas* >>>>>>>> sandra-driver-core-3.0.0-shaded.jar:/usr/local/*ca* >>>>>>>> sandra/lib/commons-cli-1.1.jar:/usr/local/*cas* >>>>>>>> sandra/lib/commons-codec-1.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/commons-lang3-3.1.jar:/usr/local/*cas* >>>>>>>> sandra/lib/commons-math3-3.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/compress-lzf-0.8.4.jar:/usr/local/*cas* >>>>>>>> sandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/local/*cas* >>>>>>>> sandra/lib/concurrent-trees-2.4.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/disruptor-3.0.1.jar:/usr/local/*cas* >>>>>>>> sandra/lib/ecj-4.4.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/guava-18.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/high-scale-lib-1.0.6.jar:/usr/local/*cas* >>>>>>>> sandra/lib/hppc-0.5.4.jar:/usr/local/*cas* >>>>>>>> sandra/lib/jackson-core-asl-1.9.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/jamm-0.3.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/javax.inject.jar:/usr/local/*cas* >>>>>>>> sandra/lib/jbcrypt-0.3m.jar:/usr/local/*cas* >>>>>>>> sandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/local/*cas* >>>>>>>> sandra/lib/jflex-1.6.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/jna-4.0.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/joda-time-2.4.jar:/usr/local/*cas* >>>>>>>> sandra/lib/json-simple-1.1.jar:/usr/local/*cas* >>>>>>>> sandra/lib/libthrift-0.9.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/local/*cas* >>>>>>>> sandra/lib/logback-classic-1.1.3.jar:/usr/local/*cas* >>>>>>>> sandra/lib/logback-core-1.1.3.jar:/usr/local/*cas* >>>>>>>> sandra/lib/lz4-1.3.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/metrics-core-3.1.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/metrics-logback-3.1.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/netty-all-4.0.23.Final.jar:/usr/local/*cas* >>>>>>>> sandra/lib/ohc-core-0.4.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/ohc-core-j8-0.4.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/primitive-1.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/reporter-config3-3.0.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/reporter-config-base-3.0.0.jar:/usr/local/*cas* >>>>>>>> sandra/lib/sigar-1.6.4.jar:/usr/local/*cas* >>>>>>>> sandra/lib/slf4j-api-1.7.7.jar:/usr/local/*cas* >>>>>>>> sandra/lib/snakeyaml-1.11.jar:/usr/local/*cas* >>>>>>>> sandra/lib/snappy-java-1.1.1.7.jar:/usr/local/*cas* >>>>>>>> sandra/lib/snowball-stemmer-1.3.0.581.1.jar:/usr/local/*cas* >>>>>>>> sandra/lib/ST4-4.0.8.jar:/usr/local/*cas* >>>>>>>> sandra/lib/stream-2.5.2.jar:/usr/local/*cas* >>>>>>>> sandra/lib/thrift-server-0.3.7.jar:/usr/local/*cas*sandra/lib/jsr223/*/*.jar >>>>>>>> org.apache.*cas*sandra.service.CassandraDaemon >>>>>>> >>>>>>> >>>>>>> Here is some dmesg output: >>>>>>> >>>>>>> [40003.010117] java invoked oom-killer: gfp_mask=0x280da, order=0, >>>>>>> oom_score_adj=0 >>>>>>> [40003.013042] java cpuset=/ mems_allowed=0 >>>>>>> [40003.014789] CPU: 3 PID: 37757 Comm: java Tainted: G E >>>>>>> 4.1.7-15.23.amzn1.x86_64 #1 >>>>>>> [40003.017852] Hardware name: Xen HVM domU, BIOS 4.2.amazon >>>>>>> 12/07/2015 >>>>>>> [40003.020066] 0000000000000000 ffff8800ebaaba18 ffffffff814da12c >>>>>>> 0000000000000000 >>>>>>> [40003.022870] ffff880763594c80 ffff8800ebaabac8 ffffffff814d7939 >>>>>>> ffff8800ebaaba78 >>>>>>> [40003.025674] ffffffff811bf8f7 ffff880770679c00 ffff88077001c190 >>>>>>> 0000000000000080 >>>>>>> [40003.028660] Call Trace: >>>>>>> [40003.029613] [<ffffffff814da12c>] dump_stack+0x45/0x57 >>>>>>> [40003.031486] [<ffffffff814d7939>] dump_header+0x7f/0x1fe >>>>>>> [40003.033390] [<ffffffff811bf8f7>] ? mem_cgroup_iter+0x137/0x3d0 >>>>>>> [40003.035475] [<ffffffff8107f496>] ? __queue_work+0x136/0x320 >>>>>>> [40003.037594] [<ffffffff8115d11c>] oom_kill_process+0x1cc/0x3b0 >>>>>>> [40003.039825] [<ffffffff8115d67e>] __out_of_memory+0x31e/0x530 >>>>>>> [40003.041938] [<ffffffff8115da2b>] out_of_memory+0x5b/0x80 >>>>>>> [40003.043857] [<ffffffff81162a79>] >>>>>>> __alloc_pages_nodemask+0x8a9/0x8d0 >>>>>>> [40003.046105] [<ffffffff811a48fa>] alloc_page_interleave+0x3a/0x90 >>>>>>> [40003.048419] [<ffffffff811a79c3>] alloc_pages_vma+0x143/0x200 >>>>>>> [40003.050582] [<ffffffff81188035>] handle_mm_fault+0x1355/0x1770 >>>>>>> [40003.052674] [<ffffffff8118e4c5>] ? do_mmap_pgoff+0x2f5/0x3c0 >>>>>>> [40003.054737] [<ffffffff8105dafc>] __do_page_fault+0x17c/0x420 >>>>>>> [40003.056858] [<ffffffff8118c976>] ? SyS_mmap_pgoff+0x116/0x270 >>>>>>> [40003.059082] [<ffffffff8105ddc2>] do_page_fault+0x22/0x30 >>>>>>> [40003.061084] [<ffffffff814e2ad8>] page_fault+0x28/0x30 >>>>>>> [40003.062938] Mem-Info: >>>>>>> [40003.063762] active_anon:5437903 inactive_anon:1025 isolated_anon:0 >>>>>>> active_file:51 inactive_file:8 isolated_file:0 >>>>>>> unevictable:2088582 dirty:0 writeback:0 unstable:0 >>>>>>> slab_reclaimable:82028 slab_unreclaimable:12209 >>>>>>> mapped:31065 shmem:20 pagetables:37089 bounce:0 >>>>>>> free:35830 free_pcp:3141 free_cma:0 >>>>>>> [40003.075549] Node 0 DMA free:15872kB min:8kB low:8kB high:12kB >>>>>>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB >>>>>>> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB >>>>>>> managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB >>>>>>> slab_reclaimable:32kB slab_unreclaimable:0kB kernel_stack:0kB >>>>>>> pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB >>>>>>> free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes >>>>>>> [40003.090267] lowmem_reserve[]: 0 3746 30128 30128 >>>>>>> [40003.092182] Node 0 DMA32 free:108236kB min:2756kB low:3444kB >>>>>>> high:4132kB active_anon:2400616kB inactive_anon:4060kB active_file:0kB >>>>>>> inactive_file:0kB unevictable:1049732kB isolated(anon):0kB >>>>>>> isolated(file):0kB present:3915776kB managed:3840296kB mlocked:1049732kB >>>>>>> dirty:4kB writeback:0kB mapped:16564kB shmem:12kB >>>>>>> slab_reclaimable:243852kB >>>>>>> slab_unreclaimable:8832kB kernel_stack:1152kB pagetables:16532kB >>>>>>> unstable:0kB bounce:0kB free_pcp:5716kB local_pcp:220kB free_cma:0kB >>>>>>> writeback_tmp:0kB pages_scanned:5408 all_unreclaimable? yes >>>>>>> [40003.108802] lowmem_reserve[]: 0 0 26382 26382 >>>>>>> [40003.110578] Node 0 Normal free:19212kB min:19412kB low:24264kB >>>>>>> high:29116kB active_anon:19350996kB inactive_anon:40kB active_file:212kB >>>>>>> inactive_file:80kB unevictable:7304596kB isolated(anon):0kB >>>>>>> isolated(file):0kB present:27525120kB managed:27015196kB >>>>>>> mlocked:7304596kB >>>>>>> dirty:0kB writeback:0kB mapped:107696kB shmem:68kB >>>>>>> slab_reclaimable:84228kB >>>>>>> slab_unreclaimable:40004kB kernel_stack:10000kB pagetables:131824kB >>>>>>> unstable:0kB bounce:0kB free_pcp:6848kB local_pcp:692kB free_cma:0kB >>>>>>> writeback_tmp:0kB pages_scanned:38332 all_unreclaimable? yes >>>>>>> [40003.128300] lowmem_reserve[]: 0 0 0 0 >>>>>>> [40003.129844] Node 0 DMA: 0*4kB 0*8kB 0*16kB 2*32kB (UE) 3*64kB >>>>>>> (UE) 2*128kB (UE) 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 2*2048kB (ER) >>>>>>> 2*4096kB (M) = 15872kB >>>>>>> [40003.135917] Node 0 DMA32: 193*4kB (UEM) 254*8kB (UEM) 714*16kB >>>>>>> (UE) 1344*32kB (UEMR) 249*64kB (UEMR) 120*128kB (UER) 53*256kB (ER) >>>>>>> 10*512kB (ER) 1*1024kB (E) 0*2048kB 0*4096kB = 108244kB >>>>>>> [40003.142956] Node 0 Normal: 3956*4kB (UE) 0*8kB 1*16kB (R) 8*32kB >>>>>>> (R) 3*64kB (R) 2*128kB (R) 3*256kB (R) 0*512kB 0*1024kB 1*2048kB (R) >>>>>>> 0*4096kB = 19360kB >>>>>>> [40003.148749] Node 0 hugepages_total=0 hugepages_free=0 >>>>>>> hugepages_surp=0 hugepages_size=2048kB >>>>>>> [40003.151777] 31304 total pagecache pages >>>>>>> [40003.153288] 0 pages in swap cache >>>>>>> [40003.154528] Swap cache stats: add 0, delete 0, find 0/0 >>>>>>> [40003.156377] Free swap = 0kB >>>>>>> [40003.157423] Total swap = 0kB >>>>>>> [40003.158465] 7864221 pages RAM >>>>>>> [40003.159522] 0 pages HighMem/MovableOnly >>>>>>> [40003.160984] 146372 pages reserved >>>>>>> [40003.162244] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds >>>>>>> swapents oom_score_adj name >>>>>>> [40003.165398] [ 2560] 0 2560 2804 181 11 >>>>>>> 3 0 -1000 udevd >>>>>>> [40003.168638] [ 3976] 0 3976 2334 123 9 >>>>>>> 3 0 0 dhclient >>>>>>> [40003.171895] [ 4017] 0 4017 11626 89 23 >>>>>>> 4 0 -1000 auditd >>>>>>> [40003.175080] [ 4035] 0 4035 61861 99 23 >>>>>>> 3 0 0 rsyslogd >>>>>>> [40003.178198] [ 4046] 0 4046 3462 98 10 >>>>>>> 3 0 0 irqbalance >>>>>>> [40003.181559] [ 4052] 0 4052 1096 22 7 >>>>>>> 3 0 0 rngd >>>>>>> [40003.184683] [ 4067] 32 4067 8815 99 22 >>>>>>> 3 0 0 rpcbind >>>>>>> [40003.187772] [ 4084] 29 4084 9957 201 24 >>>>>>> 3 0 0 rpc.statd >>>>>>> [40003.191099] [ 4115] 81 4115 5442 60 15 >>>>>>> 3 0 0 dbus-daemon >>>>>>> [40003.194438] [ 4333] 0 4333 19452 522 40 >>>>>>> 3 0 -1000 sshd >>>>>>> [40003.197432] [ 4361] 38 4361 7321 562 19 >>>>>>> 3 0 0 ntpd >>>>>>> [40003.200609] [ 4376] 0 4376 22238 720 46 >>>>>>> 3 0 0 sendmail >>>>>>> [40003.203868] [ 4384] 51 4384 20103 674 41 >>>>>>> 3 0 0 sendmail >>>>>>> [40003.206963] [ 4515] 0 4515 4267 38 13 >>>>>>> 3 0 0 atd >>>>>>> [40003.210100] [ 6730] 0 6730 29888 547 13 >>>>>>> 3 0 0 crond >>>>>>> [40003.213267] [13533] 497 13533 47235415 7455314 36074 >>>>>>> 167 0 0 java >>>>>>> [40003.216364] [13674] 498 13674 49154 3168 51 >>>>>>> 3 0 0 supervisord >>>>>>> [40003.219721] [13680] 498 13680 51046 5350 69 >>>>>>> 3 0 0 python >>>>>>> [40003.222908] [13682] 498 13682 36172 5602 75 >>>>>>> 3 0 0 python >>>>>>> [40003.225952] [13683] 498 13683 32633 5319 68 >>>>>>> 3 0 0 python >>>>>>> [40003.229108] [13684] 498 13684 29577 5003 63 >>>>>>> 3 0 0 python >>>>>>> [40003.232263] [13719] 498 13719 1035920 41287 234 >>>>>>> 8 0 0 java >>>>>>> [40003.235287] [13753] 498 13753 34605 5645 70 >>>>>>> 3 0 0 python >>>>>>> [40003.238322] [14143] 0 14143 1615 420 9 >>>>>>> 3 0 0 agetty >>>>>>> [40003.241582] [14145] 0 14145 1078 377 8 >>>>>>> 3 0 0 mingetty >>>>>>> [40003.244752] [14147] 0 14147 1078 354 8 >>>>>>> 3 0 0 mingetty >>>>>>> [40003.247833] [14149] 0 14149 1078 373 8 >>>>>>> 3 0 0 mingetty >>>>>>> [40003.251193] [14151] 0 14151 1078 367 7 >>>>>>> 3 0 0 mingetty >>>>>>> [40003.254342] [14153] 0 14153 1078 348 8 >>>>>>> 3 0 0 mingetty >>>>>>> [40003.257443] [14154] 0 14154 2803 182 10 >>>>>>> 3 0 -1000 udevd >>>>>>> [40003.260688] [14155] 0 14155 2803 182 10 >>>>>>> 3 0 -1000 udevd >>>>>>> [40003.263782] [14157] 0 14157 1078 369 8 >>>>>>> 3 0 0 mingetty >>>>>>> [40003.266895] Out of memory: Kill process 13533 (java) score 970 or >>>>>>> sacrifice child >>>>>>> [40003.269702] Killed process 13533 (java) total-vm:188941660kB, >>>>>>> anon-rss:29710828kB, file-rss:110428kB >>>>>>> >>>>>>> >>>>> >>>> >>> >> >