It does appear to be a ulimit issue to some degree as some settings are lower than recommended by a few factors (namely nproc).
http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html * - memlock unlimited * - nofile 100000 * - nproc 32768 * - as unlimited However, I'm also confident you have other issues as well, that are going to be problematic. Namely what is your heap setting at? can you grep for ERROR, WARN, dropped, GCInspector in the system.log for Cassandra and share the results? On Fri, Dec 19, 2014 at 2:23 AM, 谢良 <xieli...@xiaomi.com> wrote: > What's your vm.max_map_count setting? > > > Best Regards, > > Liang > ------------------------------ > *发件人:* Leon Oosterwijk <leon.oosterw...@macquarie.com> > *发送时间:* 2014年12月19日 11:55 > *收件人:* user@cassandra.apache.org > *主题:* Cassandra 2.1.0 Crashes the JVM with OOM with heaps of memory free > > > All, > > > > We have a Cassandra cluster which seems to be struggling a bit. I have one > node which crashes continually, and others which crash sporadically. When > they crash it’s with a JVM couldn’t allocate memory, even though there’s > heaps available. I suspect it’s because one table which is very big. > (500GB) which has on the order of 500K-700K files in its directory. When I > delete the directory contents on the crashing node and ran a repair, the > nodes around this node crashed while streaming the data. Here is the > relevant bits from the crash file and environment. > > > > Any help would be appreciated. > > > > # > > # There is insufficient memory for the Java Runtime Environment to > continue. > > # Native memory allocation (mmap) failed to map 12288 bytes for committing > reserved memory. > > # Possible reasons: > > # The system is out of physical RAM or swap space > > # In 32 bit mode, the process size limit was hit > > # Possible solutions: > > # Reduce memory load on the system > > # Increase physical memory or swap space > > # Check if swap backing store is full > > # Use 64 bit Java on a 64 bit OS > > # Decrease Java heap size (-Xmx/-Xms) > > # Decrease number of Java threads > > # Decrease Java thread stack sizes (-Xss) > > # Set larger code cache with -XX:ReservedCodeCacheSize= > > # This output file may be truncated or incomplete. > > # > > # Out of Memory Error (os_linux.cpp:2671), pid=1104, tid=139950342317824 > > # > > # JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build > 1.8.0_20-b26) > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.20-b23 mixed mode > linux-amd64 compressed oops) > > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > > # > > > > --------------- T H R E A D --------------- > > > > Current thread (0x00007f4acabb1800): JavaThread "Thread-13" [_thread_new, > id=19171, stack(0x00007f48ba6ca000,0x00007f48ba70b000)] > > > > Stack: [0x00007f48ba6ca000,0x00007f48ba70b000], sp=0x00007f48ba709a50, > free space=254k > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > > V [libjvm.so+0xa76cea] VMError::report_and_die()+0x2ca > > V [libjvm.so+0x4e52fb] report_vm_out_of_memory(char const*, int, > unsigned long, VMErrorType, char const*)+0x8b > > V [libjvm.so+0x8e4ec3] os::Linux::commit_memory_impl(char*, unsigned > long, bool)+0x103 > > V [libjvm.so+0x8e4f8c] os::pd_commit_memory(char*, unsigned long, > bool)+0xc > > V [libjvm.so+0x8dce4a] os::commit_memory(char*, unsigned long, bool)+0x2a > > V [libjvm.so+0x8e33af] os::pd_create_stack_guard_pages(char*, unsigned > long)+0x7f > > V [libjvm.so+0xa21bde] JavaThread::create_stack_guard_pages()+0x5e > > V [libjvm.so+0xa29954] JavaThread::run()+0x34 > > V [libjvm.so+0x8e75f8] java_start(Thread*)+0x108 > > C [libpthread.so.0+0x79d1] > > > > > > Memory: 4k page, physical 131988232k(694332k free), swap > 37748728k(37748728k free) > > > > vm_info: Java HotSpot(TM) 64-Bit Server VM (25.20-b23) for linux-amd64 JRE > (1.8.0_20-b26), built on Jul 30 2014 13:13:52 by "java_re" with gcc 4.3.0 > 20080428 (Red Hat 4.3.0-8) > > > > time: Fri Dec 19 14:37:29 2014 > > elapsed time: 2303 seconds (0d 0h 38m 23s) > > > > OS:Red Hat Enterprise Linux Server release 6.5 (Santiago) > > > > uname:Linux 2.6.32-431.5.1.el6.x86_64 #1 SMP Fri Jan 10 14:46:43 EST 2014 > x86_64 > > libc:glibc 2.12 NPTL 2.12 > > rlimit: STACK 10240k, CORE 0k, NPROC 8192, NOFILE 65536, AS infinity > > load average:4.18 4.79 4.54 > > > > /proc/meminfo: > > MemTotal: 131988232 kB > > MemFree: 694332 kB > > Buffers: 837584 kB > > Cached: 51002896 kB > > SwapCached: 0 kB > > Active: 93953028 kB > > Inactive: 32850628 kB > > Active(anon): 70851112 kB > > Inactive(anon): 4713848 kB > > Active(file): 23101916 kB > > Inactive(file): 28136780 kB > > Unevictable: 0 kB > > Mlocked: 0 kB > > SwapTotal: 37748728 kB > > SwapFree: 37748728 kB > > Dirty: 75752 kB > > Writeback: 0 kB > > AnonPages: 74963768 kB > > Mapped: 739884 kB > > Shmem: 601592 kB > > Slab: 3460252 kB > > SReclaimable: 3170124 kB > > SUnreclaim: 290128 kB > > KernelStack: 36224 kB > > PageTables: 189772 kB > > NFS_Unstable: 0 kB > > Bounce: 0 kB > > WritebackTmp: 0 kB > > CommitLimit: 169736960 kB > > Committed_AS: 92208740 kB > > VmallocTotal: 34359738367 kB > > VmallocUsed: 492032 kB > > VmallocChunk: 34291733296 kB > > HardwareCorrupted: 0 kB > > AnonHugePages: 67717120 kB > > HugePages_Total: 0 > > HugePages_Free: 0 > > HugePages_Rsvd: 0 > > HugePages_Surp: 0 > > Hugepagesize: 2048 kB > > DirectMap4k: 5056 kB > > DirectMap2M: 2045952 kB > > DirectMap1G: 132120576 kB > > > > Before you say It’s a ulimit issue: > > [501]> ulimit -a > > core file size (blocks, -c) 0 > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 1030998 > > max locked memory (kbytes, -l) 64 > > max memory size (kbytes, -m) unlimited > > open files (-n) 8192 > > pipe size (512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 10240 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 8192 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > > > Here’s the filecount on one of the nodes for this very big table: > > loosterw@NODE > :/env/datacache/data/cassandra/data/datastore/bigtable-e58925706a3c11e4ba63adfbd009c4d6 > > > ls | wc -l > > *588636* > > > > Thanks, > > > > Leon > > > > > > This email, including any attachments, is confidential. If you are not the > intended recipient, you must not disclose, distribute or use the > information in this email in any way. If you received this email in error, > please notify the sender immediately by return email and delete the > message. Unless expressly stated otherwise, the information in this email > should not be regarded as an offer to sell or as a solicitation of an offer > to buy any financial product or service, an official confirmation of any > transaction, or as an official statement of the entity sending this > message. Neither Macquarie Group Limited, nor any of its subsidiaries, > guarantee the integrity of any emails or attached files and are not > responsible for any changes made to them by any other person. > > > -- [image: datastax_logo.png] <http://www.datastax.com/> Ryan Svihla Solution Architect [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.