Steve, thanks a ton! Removing compactions_in_progress helped! Now the node is running again.
p.s. Sorry for referring to you by the last name in my last email, I got confused. On Thu, Dec 10, 2015 at 2:09 AM, Walsh, Stephen <stephen.wa...@aspect.com> wrote: > 8GB is the max recommended for heap size and that’s if you have 32GB or > more available. > > > > We use 6GB on our 16GB machines and its very stable > > > > The out of memory could be coming from cassandra reloading > compactions_in_progress into memory, you can check this from the log files > if needs be. > > You can safely delete this folder inside the data directory. > > > > This can happen if you didn’t stop cassandra with a drain command and wait > for the compactions to finish. > > Last time we hit it – was due to testing HA when we forced killed an > entire cluster. > > > > Steve > > > > > > > > *From:* Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] > *Sent:* 10 December 2015 02:49 > *To:* user@cassandra.apache.org > *Subject:* Re: Unable to start one Cassandra node: OutOfMemoryError > > > > 8G is probably too small for a G1 heap. Raise your heap or try CMS instead. > > > > 71% of your heap is collections – may be a weird data model quirk, but try > CMS first and see if that behaves better. > > > > > > > > *From: *Mikhail Strebkov > *Reply-To: *"user@cassandra.apache.org" > *Date: *Wednesday, December 9, 2015 at 5:26 PM > *To: *"user@cassandra.apache.org" > *Subject: *Unable to start one Cassandra node: OutOfMemoryError > > > > Hi everyone, > > > > While upgrading our 5 machines cluster from DSE version 4.7.1 (Cassandra > 2.1.8) to DSE version: 4.8.2 (Cassandra 2.1.11) one of the nodes can't > start with OutOfMemoryError. > > We're using HotSpot 64-Bit Server VM/1.8.0_45 and G1 garbage collector > with 8 GiB heap. > > Average node size is 300 GiB. > > > > I looked at the heap dump with YourKit profiler (www.yourkit.com) and it > was quite hard since it's so big, but can't get much out of it: > http://i.imgur.com/fIRImma.png > > > > As far as I understand the report, there are 1,332,812 instances of > org.apache.cassandra.db.Row which retain 8 GiB. I don't understand why all > of them are still strongly reachable? > > > > Please help me to debug this. I don't know even where to start. > > I feel very uncomfortable with 1 node running 4.8.2, 1 node down and 3 > nodes running 4.7.1 at the same time. > > > > Thanks, > > Mikhail > > > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >