2 hour bout of pending gossip, pending mutations, high CPU, high ParNew, dropped messages

Thunder Stumpges Thu, 24 Apr 2014 10:22:24 -0700

Hi all,

I am looking into an issue we ran into last night with a single node in our
three node 2.0.6 cluster. The top level symptoms were timed out writes, and
high latency read and write.


Looking into it more, the node experienced all of these during this two
hour window which it eventually recovered from on its own.

** "Gossip stage" pending tasks **
 WARN [GossipTasks:1] 2014-04-23 18:51:36,231 Gossiper.java (line 612)
Gossip stage has 2 pending tasks; skipping status check (no nodes will be
marked down)
 WARN [GossipTasks:1] 2014-04-23 18:52:36,910 Gossiper.java (line 612)
Gossip stage has 2 pending tasks; skipping status check (no nodes will be
marked down)
 WARN [GossipTasks:1] 2014-04-23 18:52:47,886 Gossiper.java (line 612)
Gossip stage has 2 pending tasks; skipping status check (no nodes will be
marked down)
 WARN [GossipTasks:1] 2014-04-23 18:53:15,094 Gossiper.java (line 612)
Gossip stage has 2 pending tasks; skipping status check (no nodes will be
marked down)


Strange thing here is it never showed as pending in the TPstats logged by
status logger:
 INFO [ScheduledTasks:1] 2014-04-23 18:56:06,581 StatusLogger.java (line
70) GossipStage                       0         0        9065668         0
                0

** High CPU - ~50%-%60 on these dual hexa-core boxes is pretty crazy.
normal is barely moving the needle at 3%.
** High level of ParNew collections - Likely the cause of the CPU
considering it was running these par-new collections every couple hundred
ms. CMS gen seemed OK at 4GB of 6GB and not much remaining after the
par-new collection:
'Heap after GC invocations=151586 (full 137):
 par new generation   total 1887488K, used 147K"

** Backed up Mutations in Mutation stage of TPStats and dropped messages:
 2014-04-23 18:56:06,579 MessagingService.java (line 841) 210 MUTATION
messages dropped in last 5000ms
 2014-04-23 18:56:06,579 MessagingService.java (line 841) 12 READ_REPAIR
messages dropped in last 5000ms
 2014-04-23 18:56:06,579 Pool Name                    Active   Pending
 Completed   Blocked  All Time Blocked
 2014-04-23 18:56:06,580 ReadStage                         4        10
 398908067         0                 0
 2014-04-23 18:56:06,580 RequestResponseStage              0         0
 178297428         0                 0
 2014-04-23 18:56:06,581 ReadRepairStage                   0         0
  33509717         0                 0
 2014-04-23 18:56:06,581 MutationStage                    96     12708
 107009834         0                 0
 2014-04-23 18:56:06,581 ReplicateOnWriteStage             0         0
         0         0                 0
 2014-04-23 18:56:06,581 GossipStage                       0         0
   9065668         0                 0
 2014-04-23 18:56:06,582 AntiEntropyStage                  0         0
   1413264         0                 0
 2014-04-23 18:56:06,582 MigrationStage                    0         0
        37         0                 0
 2014-04-23 18:56:06,582 MemtablePostFlusher               0         0
    546841         0                 0
 2014-04-23 18:56:06,582 MemoryMeter                       0         0
       234         0                 0
 2014-04-23 18:56:06,583 FlushWriter                       0         0
    165232         0                12
 2014-04-23 18:56:06,583 MiscStage                         0         0
    360672         0                 0
 2014-04-23 18:56:06,583 PendingRangeCalculator            0         0
         5         0                 0
 2014-04-23 18:56:06,583 commitlog_archiver                0         0
         0         0                 0
 2014-04-23 18:56:06,584 InternalResponseStage             0         0
    358384         0                 0
 2014-04-23 18:56:06,584 AntiEntropySessions               0         0
     78366         0                 0
 2014-04-23 18:56:06,584 HintedHandoff                     0         0
        28         0                 0
 2014-04-23 18:56:06,585 CompactionManager                 0         0
 2014-04-23 18:56:06,585 Commitlog                       n/a         0
 2014-04-23 18:56:06,585 MessagingService                n/a       0/0

Any ideas anyone? Could it have all been caused by the backed up gossip
tasks? Would that also cause somehow the MutationStage backups? I find it
really strange that the GossipTasks logger kept saying gossip tasks were
pending but they never showed up on tpstats in status logger...??

thanks in advance for any insight,
Thunder

2 hour bout of pending gossip, pending mutations, high CPU, high ParNew, dropped messages

Reply via email to