RE: Cassandra shutdown during large number of compactions - now fails to start with OOM Exception

Walsh, Stephen Mon, 21 Sep 2015 04:09:35 -0700

Although I didn't get an answer on this, it's worth noting the removing the 
compaction_in_progress folder resolved the issue.

From: Walsh, Stephen
Sent: 17 September 2015 16:37
To: 'user@cassandra.apache.org' <user@cassandra.apache.org>
Subject: RE: Cassandra shutdown during large number of compactions - now fails 
to start with OOM Exception

Some more info,

Looking at the Java Memory Dump file.

I see about 400 SSTableScanners  - one for each of our column Families.
Each is about 200MB in size.
And (from what I can see) all of them are reading from a 
"compactions_in_progress-ka-000000-Data.db" file

dfile  org.apache.cassandra.io.compress.CompressedRandomAccessReader path = 
"/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-71661-Data.db"
 131840 104

Steve

From: Walsh, Stephen
Sent: 17 September 2015 15:33
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Cassandra shutdown during large number of compactions - now fails to 
start with OOM Exception

Hey all, I was hoping someone had a similar issue.
We're using 2.1.6 and shutdown a testbed in AWS thinking we were finished with 
it,
We started it backup today and saw that only 2 of 4 nodes came up.

Seems there was a lot of compaction happening at the time it was shutdown, 
cassandra tries to start-up and we get an OutOfMemory Exception.

INFO  13:45:57 Initializing system.range_xfers
INFO  13:45:57 Initializing system.schema_keyspaces
INFO  13:45:57 Opening 
/var/lib/cassandra/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-21807
 (19418 bytes)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/log/cassandra/java_pid3011.hprof ...
Heap dump file created [7751760805 bytes in 52.439 secs]
ERROR 13:47:11 Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space

it's not related the key_cache, we removed this and the issue is still present.
So we believe its re-trying all the compactions that were in place when it went 
down.

We've modified the HEAP size to be half of the systems RAM (8GB in this case)

At the moment the only work around we have is to empty the data / saved_cache / 
commit_log folders and let it re-sync with the other nodes.

Has anyone seen this before and what have they done to solve it?
Can we remove unfinished compactions?

Steve

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

RE: Cassandra shutdown during large number of compactions - now fails to start with OOM Exception

Reply via email to