Hi, all
We have been a Cassandra 4-node cluster (C* 1.2.x) where a node marked all the other 3 nodes DOWN, and came back UP a few seconds later. There was a compaction that kicked in a minute before, roughly 10~MB in size, followed by marking all the other nodes DOWN later. In the other words, in the system.log we see 00:00:00 Compacting .... 00:00:03 Compacted 8 sstables ... 10~ megabytes 00:01:06 InetAddress /x.x.x.4 is now DOWN 00:01:06 InetAddress /x.x.x.3 is now DOWN 00:01:06 InetAddress /x.x.x.1 is now DOWN There was no significant GC activities in gc.log. We have heard that busy compaction activities would cause this behavior, but we cannot reason why this could happen logically. How come a compaction operation would stop the Gossip thread to perform heartbeat check? Has anyone experienced this kind of behavior before? Thanks in advanced! Sincerely, Michael Fong