Hi,
thanks for the answer. There were no large insertions and the
saved_caches dir had a resonable size. I tried to delete the cashes and
set key_cache_size_in_mb to zero, but it didn't help.
Today our virtual hardware provided raised cpus to 4, memory to 32GB and
doubled the disk size, and the nodes are stable again. So it was
probably an issue of severe lack of resources.
About HEAP_NEWSIZE, your suggestion is quite intriguing. I thought it
was better to set it 100mb*#cores, so in my case I set it to 200 and now
I should set it to 400. Do larger values help without being harmful?
Regards,
Paolo
Il 27/05/2016 03:05, Mike Yeap ha scritto:
Hi Paolo,
a) was there any large insertion done?
b) are the a lot of files in the saved_caches directory?
c) would you consider to increase the HEAP_NEWSIZE to, say, 1200M?
Regards,
Mike Yeap
On Fri, May 27, 2016 at 12:39 AM, Paolo Crosato
<paolo.cros...@targaubiest.com <mailto:paolo.cros...@targaubiest.com>>
wrote:
Hi,
we are running a cluster of 4 nodes, each one has the same sizing:
2 cores, 16G ram and 1TB of disk space.
On every node we are running cassandra 2.0.17, oracle java version
"1.7.0_45", centos 6 with this kernel version
2.6.32-431.17.1.el6.x86_64
Two nodes are running just fine, the other two have started to go
OOM at every start.
This is the error we get:
INFO [ScheduledTasks:1] 2016-05-26 18:15:58,460 StatusLogger.java
(line 70) ReadRepairStage 0 0
116 0 0
INFO [ScheduledTasks:1] 2016-05-26 18:15:58,462 StatusLogger.java
(line 70) MutationStage 31 1369
20526 0 0
INFO [ScheduledTasks:1] 2016-05-26 18:15:58,590 StatusLogger.java
(line 70) ReplicateOnWriteStage 0 0 0
0 0
INFO [ScheduledTasks:1] 2016-05-26 18:15:58,591 StatusLogger.java
(line 70) GossipStage 0 0
335 0 0
INFO [ScheduledTasks:1] 2016-05-26 18:16:04,195 StatusLogger.java
(line 70) CacheCleanupExecutor 0 0 0
0 0
INFO [ScheduledTasks:1] 2016-05-26 18:16:06,526 StatusLogger.java
(line 70) MigrationStage 0 0 0
0 0
INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java
(line 70) MemoryMeter 1 4 26
0 0
INFO [ScheduledTasks:1] 2016-05-26 18:16:06,527 StatusLogger.java
(line 70) ValidationExecutor 0 0 0
0 0
DEBUG [MessagingService-Outgoing-/10.255.235.19
<http://10.255.235.19>] 2016-05-26 18:16:06,518
OutboundTcpConnection.java (line 290) attempting to connect to
/10.255.235.19 <http://10.255.235.19>
INFO [GossipTasks:1] 2016-05-26 18:16:22,912 Gossiper.java (line
992) InetAddress /10.255.235.28 <http://10.255.235.28> is now DOWN
INFO [ScheduledTasks:1] 2016-05-26 18:16:22,952 StatusLogger.java
(line 70) FlushWriter 1 5 47
0 25
INFO [ScheduledTasks:1] 2016-05-26 18:16:22,953 StatusLogger.java
(line 70) InternalResponseStage 0 0 0
0 0
ERROR [ReadStage:27] 2016-05-26 18:16:29,250 CassandraDaemon.java
(line 258) Exception in thread Thread[ReadStage:27,5,main]
java.lang.OutOfMemoryError: Java heap space
at
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
at
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
at
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:153)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:434)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:387)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
at
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
at
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
at
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:87)
at
org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
at
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1619)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1438)
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:340)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:89)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
ERROR [ReadStage:32] 2016-05-26 18:16:29,357 CassandraDaemon.java
(line 258) Exception in thread Thread[ReadStage:32,5,main]
java.lang.OutOfMemoryError: Java heap space
at
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
at
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
at
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:153)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:434)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:387)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
at
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
at
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
at
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:87)
at
org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
at
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1619)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1438)
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:340)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:89)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
We are observing that the heap is never flushed, it keeps
increasing until reaching the limit, then the OOM errors appear
and after a short while the node crashes.
These are the relevant settings in cassandra_env for one of the
crashing nodes:
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="200M"
This is the complete error log http://pastebin.com/QGaACyhR
This is cassandra_env http://pastebin.com/6SLeVmtv
This is cassandra.yaml http://pastebin.com/wb1axHtV
Can anyone help?
Regards,
Paolo Crosato
--
Paolo Crosato
Software engineer/Custom Solutions
e-mail:paolo.cros...@targaubiest.com <mailto:paolo.cros...@targaubiest.com>
--
Paolo Crosato
Software engineer/Custom Solutions
e-mail: paolo.cros...@targaubiest.com