Re: Cassandra compaction appears to stall, node becomes partially unresponsive

Aiman Parvaiz Wed, 22 Jul 2015 15:23:28 -0700

Hi Bryan
How's GC behaving on these boxes?

On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng <br...@blockcypher.com> wrote:


> Hi there,
>
> Within our Cassandra cluster, we're observing, on occasion, one or two
> nodes at a time becoming partially unresponsive.
>
> We're running 2.1.7 across the entire cluster.
>
> nodetool still reports the node as being healthy, and it does respond to
> some local queries; however, the CPU is pegged at 100%. One common thread
> (heh) each time this happens is that there always seems to be one of more
> compaction threads running (via nodetool tpstats), and some appear to be
> stuck (active count doesn't change, pending count doesn't decrease). A
> request for compactionstats hangs with no response.
>
> Each time we've seen this, the only thing that appears to resolve the
> issue is a restart of the Cassandra process; the restart does not appear to
> be clean, and requires one or more attempts (or a -9 on occasion).
>
> There does not seem to be any pattern to what machines are affected; the
> nodes thus far have been different instances on different physical machines
> and on different racks.
>
> Has anyone seen this before? Alternatively, when this happens again, what
> data can we collect that would help with the debugging process (in addition
> to tpstats)?
>
> Thanks in advance,
>
> Bryan
>



-- 
*Aiman Parvaiz*
Lead Systems Architect
ai...@flipagram.com
cell: 213-300-6377
http://flipagram.com/apz

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

Reply via email to