Hi C* users,

In recent time I had couple of my nodes crashing (on different dates). I
don't have core dumps however my JVM crash logs goes like this:
===========================================
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8f608c8e40, pid=6916, tid=140253195458304
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build
1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# C  [liblz4-java6471621810388748482.so+0x5e40]  LZ4_decompress_fast+0xa0
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
...
---------------  T H R E A D  ---------------


Current thread (0x00007f8f5c7b2d50):  JavaThread "CompactionExecutor:11952"
daemon [_thread_in_native, id=16219,
stack(0x00007f8f3de0d000,0x00007f8f3de4e000)]
...
Stack: [0x00007f8f3de0d000,0x00007f8f3de4e000],  sp=0x00007f8f3de4c0e0,
 free space=252k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
C  [liblz4-java6471621810388748482.so+0x5e40]  LZ4_decompress_fast+0xa0

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 4150
 
net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast([BLjava/nio/ByteBuffer;I[BLjava/nio/ByteBuffer;II)I
(0 bytes) @ 0x00007f8f791e4723 [0x00007f8f791e4680+0xa3]
J 19836 C2
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap()V
(354 bytes) @ 0x00007f8f7b714930 [0x00007f8f7b714320+0x610]
J 6662 C2
org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(Lorg/apache/cassandra/io/sstable/format/SSTableReader;Lorg/apache/cassandra/io/util/FileDataInput;Lorg/apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/RowIndexEntry;Lorg/apache
/cassandra/db/filter/ColumnFilter;Z)V (389 bytes) @ 0x00007f8f79c1cdb8
[0x00007f8f79c1c500+0x8b8]
J 22393 C2
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(Lorg/apache/cassandra/db/ColumnFamilyStore;Z)Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;
(818 bytes) @ 0x00007f8f7c1d4364 [0x00007f8f7c1d2f40+0x1424]
J 22166 C1
org.apache.cassandra.db.Keyspace.indexPartition(Lorg/apache/cassandra/db/DecoratedKey;Lorg/apache/cassandra/db/ColumnFamilyStore;Ljava/util/Set;)V
(274 bytes) @ 0x00007f8f7beb6304 [0x00007f8f7beb5420+0xee4]
j  org.apache.cassandra.index.SecondaryIndexBuilder.build()V+46
j  org.apache.cassandra.db.compaction.CompactionManager$11.run()V+18
J 22293 C2
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
(225 bytes) @ 0x00007f8f7b17727c [0x00007f8f7b176da0+0x4dc]
J 21302 C2 java.lang.Thread.run()V (17 bytes) @ 0x00007f8f79fe59f8
[0x00007f8f79fe59a0+0x58]
v  ~StubRoutines::call_stub
...
VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap:
 par new generation   total 368640K, used 123009K [0x00000006d5e00000,
0x00000006eee00000, 0x00000006eee00000)
  eden space 327680K,  34% used [0x00000006d5e00000, 0x00000006dcaf35c8,
0x00000006e9e00000)
  from space 40960K,  27% used [0x00000006e9e00000, 0x00000006ea92cf00,
0x00000006ec600000)
  to   space 40960K,   0% used [0x00000006ec600000, 0x00000006ec600000,
0x00000006eee00000)
 concurrent mark-sweep generation total 3426304K, used 1288977K
[0x00000006eee00000, 0x00000007c0000000, 0x00000007c0000000)
 Metaspace       used 41685K, capacity 42832K, committed 43156K, reserved
1087488K
  class space    used 4455K, capacity 4702K, committed 4756K, reserved
1048576K
...
OS:DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.1 LTS"

uname:Linux 3.2.0-35-virtual #55-Ubuntu SMP Wed Dec 5 18:02:05 UTC 2012
x86_64
libc:glibc 2.15 NPTL 2.15
rlimit: STACK 8192k, CORE 0k, NPROC 119708, NOFILE 100000, AS infinity
load average:2.96 1.08 0.60

What am I missing?
Both crashes seems to happen during compaction and when running native code
(LZ4).
Both crashes happens when the nodes are doing scheduled repair (so under
increased load).
Machines are 4vCPUs and 15GB ram (m1.xlarge)
Any hint?

Best,

Reply via email to