Hi, I'm running a 3 nodes Cassandra 2.1.x cluster. Each node has 8vCPU and 30 Go RAM. Replication factor = 3 for my keyspace.
Recently, i'm using the Java Driver (within Storm) to read / write data and I've encountered a problem : All of my cluster nodes are sucessfully discovered by the driver. When doing a pretty heavy load on my cluster (1k read & 3k write per seconds) it appears that one of my node is getting overhelm.. a lot.. and other nodes are OK : Node 1 : load : 17 node 2 : load 3 node 3 : load 3 RAM usage is not a problem at all. On the node1, the system.log, there is a lot of StatusLogger stuff.. INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.range_xfers 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.compactions_in_progress 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.peers 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.schema_keyspaces 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.schema_usertypes 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.local 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.sstable_activity 632,27087 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.schema_columns 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - system.batchlog 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - keyspace1.Counter3 0,0 INFO [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 - keyspace1.standard1 0,0 INFO [Service Thread] 2016-05-25 15:35:04,531 StatusLogger.java:115 - keyspace1.counter1 0,0 INFO [Service Thread] 2016-05-25 15:35:04,531 StatusLogger.java:115 - system_traces.sessions 0,0 INFO [Service Thread] 2016-05-25 15:35:04,532 StatusLogger.java:115 - system_traces.events 0,0 INFO [Service Thread] 2016-05-25 15:39:04,438 GCInspector.java:258 - ParNew GC in 432ms. CMS Old Gen: 2035104888 -> 2040946040; Par Eden Space: 671088640 -> 0; Par Survivor Space: 83884256 -> 83872168 INFO [Service Thread] 2016-05-25 15:39:04,438 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked INFO [Service Thread] 2016-05-25 15:39:04,439 StatusLogger.java:66 - MutationStage 0 0 12598562 0 0 INFO [Service Thread] 2016-05-25 15:39:04,439 StatusLogger.java:66 - RequestResponseStage 0 0 9124551 0 0 INFO [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 - ReadRepairStage 0 0 286466 0 0 INFO [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 - CounterMutationStage 0 0 0 0 0 INFO [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 - ReadStage 0 0 3090180 0 0 INFO [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 - MiscStage 0 0 0 0 0 INFO [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 - HintedHandoff 0 0 14 0 0 INFO [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 - GossipStage 0 0 99815 0 0 INFO [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 - CacheCleanupExecutor 0 0 0 0 0 INFO [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 - InternalResponseStage 0 0 0 0 0 There is more message of GCInspector like this : INFO [Service Thread] 2016-05-25 15:35:04,524 GCInspector.java:258 - ParNew GC in 266ms. CMS Old Gen: 2029659880 -> 2035104888; Par Eden Space: 671088640 -> 0; Par Survivor Space: 83885104 -> 83884256 All of my node are configured the exact same way. With cassandra stress tool, I was able to hit 40k to 75k operations per secondes pretty fine. Can someone help me to debug this problem ? Is there a problem with the Java Driver ? The load balancing is not "working" ? How can I list connections on a node ? Regards, Bastien