Hi All, We are facing a very strange issue in our C* ring. We are using C* v1.2.4, 7 Nodes in DC1, 3 Nodes in DC2 and 3 Nodes in DC3. We have been testing read/write performances in DC1, by having different disks configurations. For instance we have node1-DC1 use JBOD and node2-DC1 is using RAID-0 configuration. Over the last week everything seems to be running fine until yesterday when node2-DC1 (RAID-0) config stop responding to client requests and timing out queries. JMX console showed up to 25k Thrift threads running, no pending compaction running, a lot of pending reads and that's it, CPU is averaging at 10% heap usage is about 4GB of the 8GB available. Node2-DC1 become unresponsive but still other node were trying to query it and it was not flagged as dead or unresponsive from Gossip messages, wondering if it's a bug.
Log file shows after stopping thrift from node2-DC1: INFO [ScheduledTasks:1] 2013-06-14 09:29:37,433 StatusLogger.java (line 53) Pool Name Active Pending Blocked INFO [ScheduledTasks:1] 2013-06-14 09:29:37,564 StatusLogger.java (line 68) ReadStage 30 959 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,564 StatusLogger.java (line 68) RequestResponseStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,565 StatusLogger.java (line 68) ReadRepairStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,565 StatusLogger.java (line 68) MutationStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,566 StatusLogger.java (line 68) ReplicateOnWriteStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,566 StatusLogger.java (line 68) GossipStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,567 StatusLogger.java (line 68) AntiEntropyStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,567 StatusLogger.java (line 68) MigrationStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,568 StatusLogger.java (line 68) MemtablePostFlusher 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,568 StatusLogger.java (line 68) FlushWriter 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,569 StatusLogger.java (line 68) MiscStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,569 StatusLogger.java (line 68) commitlog_archiver 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,570 StatusLogger.java (line 68) InternalResponseStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,570 StatusLogger.java (line 68) HintedHandoff 0 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,572 StatusLogger.java (line 73) CompactionManager 0 0 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,573 StatusLogger.java (line 85) MessagingService n/a 0,42 INFO [ScheduledTasks:1] 2013-06-14 09:29:37,574 StatusLogger.java (line 95) Cache Type Size Capacity KeysToSave Provider INFO [ScheduledTasks:1] 2013-06-14 09:29:37,574 StatusLogger.java (line 96) KeyCache 602369792 1048576000 all INFO [ScheduledTasks:1] 2013-06-14 09:29:37,574 StatusLogger.java (line 102) RowCache 0 0 all Any hint to track down this error would be useful, Many Thanks, Haithem