Re: OOM on Apache Cassandra on 30 Plus node at the same time

Shravan C Tue, 07 Mar 2017 19:59:50 -0800

In fact I truncated hints table to stabilize the cluster. Through the heap 
dumps I was able to identify the table on which there were numerous queries. 
Then I focused on system_traces.session table around the time OOM occurred. It 
turned out to be a full table scan on a large table which caused OOM.

Thanks everyone of you.
________________________________
From: Jeff Jirsa <jji...@apache.org>
Sent: Tuesday, March 7, 2017 1:19 PM
To: user@cassandra.apache.org
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time

On 2017-03-03 09:18 (-0800), Shravan Ch <chall...@outlook.com> wrote:
>
> nodetool compactionstats -H
> pending tasks: 3
> compaction type            keyspace                          table   
> completed      total    unit   progress
>         Compaction              system                          hints     
> 28.5 GB   92.38 GB   bytes     30.85%
>
>

The hint buildup is also something that could have caused OOMs, too. Hints are 
stored for a given host in a single partition, which means it's common for a 
single row/partition to get huge if you have a single host flapping.

If you see "Compacting large row" messages for the hint rows, I suspect you'll 
find that one of the hosts/rows is responsible for most of that 92GB of hints, 
which means when you try to deliver the hints, you'll read from a huge 
partition, which creates memory pressure (see: CASSANDRA-9754) leading to GC 
pauses (or ooms), which then causes you to flap, which causes you to create 
more hints, which causes an ugly spiral.

In 3.0, hints were rewritten to avoid this problem, but short term, you may 
need to truncate your hints to get healthy (assuming it's safe for you to do 
so, where 'safe' is based on your read+write consistency levels).

Re: OOM on Apache Cassandra on 30 Plus node at the same time

Reply via email to