Hi,

I'm running a 3 nodes Cassandra 2.1.x cluster. Each node has 8vCPU and 30
Go RAM.
Replication factor = 3 for my keyspace.

Recently, i'm using the Java Driver (within Storm) to read / write data and
I've encountered a problem :

All of my cluster nodes are sucessfully discovered by the driver.

When doing a pretty heavy load on my cluster (1k read & 3k write per
seconds) it appears that one of my node is getting overhelm.. a lot.. and
other nodes are OK :
Node 1 : load : 17
node 2 : load 3
node 3 : load 3

RAM usage is not a problem at all.

On the node1, the system.log, there is a lot of StatusLogger stuff..

INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.range_xfers                        0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.compactions_in_progress                 0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.peers                              0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.schema_keyspaces                   0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.schema_usertypes                   0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.local                              0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.sstable_activity             632,27087
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.schema_columns                     0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
system.batchlog                           0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
keyspace1.Counter3                        0,0
INFO  [Service Thread] 2016-05-25 15:35:04,530 StatusLogger.java:115 -
keyspace1.standard1                       0,0
INFO  [Service Thread] 2016-05-25 15:35:04,531 StatusLogger.java:115 -
keyspace1.counter1                        0,0
INFO  [Service Thread] 2016-05-25 15:35:04,531 StatusLogger.java:115 -
system_traces.sessions                    0,0
INFO  [Service Thread] 2016-05-25 15:35:04,532 StatusLogger.java:115 -
system_traces.events                      0,0
INFO  [Service Thread] 2016-05-25 15:39:04,438 GCInspector.java:258 -
ParNew GC in 432ms.  CMS Old Gen: 2035104888 -> 2040946040; Par Eden Space:
671088640 -> 0; Par Survivor Space: 83884256 -> 83872168
INFO  [Service Thread] 2016-05-25 15:39:04,438 StatusLogger.java:51 - Pool
Name                    Active   Pending      Completed   Blocked  All Time
Blocked
INFO  [Service Thread] 2016-05-25 15:39:04,439 StatusLogger.java:66 -
MutationStage                     0         0       12598562
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,439 StatusLogger.java:66 -
RequestResponseStage              0         0        9124551
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 -
ReadRepairStage                   0         0         286466
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 -
CounterMutationStage              0         0              0
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 -
ReadStage                         0         0        3090180
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 -
MiscStage                         0         0              0
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 -
HintedHandoff                     0         0             14
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 -
GossipStage                       0         0          99815
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 -
CacheCleanupExecutor              0         0              0
0                 0
INFO  [Service Thread] 2016-05-25 15:39:04,440 StatusLogger.java:66 -
InternalResponseStage             0         0              0
0                 0

There is more message of GCInspector like this :
INFO  [Service Thread] 2016-05-25 15:35:04,524 GCInspector.java:258 -
ParNew GC in 266ms.  CMS Old Gen: 2029659880 -> 2035104888; Par Eden Space:
671088640 -> 0; Par Survivor Space: 83885104 -> 83884256

All of my node are configured the exact same way.

With cassandra stress tool, I was able to hit 40k to 75k operations per
secondes pretty fine.

Can someone help me to debug this problem ?

Is there a problem with the Java Driver ? The load balancing is not
"working" ? How can I list connections on a node ?

Regards,
Bastien

Reply via email to