C* 1.2.x vs Gossip marking DOWN/UP

Michael Fong Wed, 13 Apr 2016 01:59:09 -0700

Hi, all


We have been a Cassandra 4-node cluster (C* 1.2.x) where a node marked all the 
other 3 nodes DOWN, and came back UP a few seconds later. There was a 
compaction that kicked in a minute before, roughly 10~MB in size, followed by 
marking all the other nodes DOWN later. In the other words, in the system.log 
we see
00:00:00 Compacting ....
00:00:03 Compacted 8 sstables ... 10~ megabytes
00:01:06 InetAddress /x.x.x.4 is now DOWN
00:01:06 InetAddress /x.x.x.3 is now DOWN
00:01:06 InetAddress /x.x.x.1 is now DOWN

There was no significant GC activities in gc.log. We have heard that busy 
compaction activities would cause this behavior, but we cannot reason why this 
could happen logically. How come a compaction operation would stop the Gossip 
thread to perform heartbeat check? Has anyone experienced this kind of behavior 
before?

Thanks in advanced!

Sincerely,

Michael Fong

C* 1.2.x vs Gossip marking DOWN/UP

Reply via email to