Hi there,

We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.

While our data volume is increasing (34 TB now), we are running into
some problems:

1) Read latency is around 1000 ms when running 600 reads/sec (DC1
CL.LOCAL_ONE). At the same time the load average is about 20-30 on all
DC1 nodes(8 cores CPU - 32 GB RAM). C* starts timing out connections.
Still in this scenario OpsCenter has some issues as well. Opscenter
resets all Graphs layout and backs to the default layout on every
refresh. It doesn't back to normal after the load decrease. I only
managed to put OpsCenter to it's normal behavior after reinstalling
it.
Just for reference, we are using SATA HDDs on all nodes and running
hdparm to check disk performance under this load, some nodes are
reporting very low read rates (under 10 MB/sec), while others above
100 MB/sec. Under low load average this rate is above 250 MB/sec.

2) Repair takes at least 4-5 days to complete. Last repair was 20 days
ago. Running repair under high loads is bringing some nodes down with
the exception: "JVMStabilityInspector.java:94 - JVM state determined
to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError:
Java heap space"

Any hints?

Regards,

Roni Balthazar

Reply via email to