Hi there, We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.
While our data volume is increasing (34 TB now), we are running into some problems: 1) Read latency is around 1000 ms when running 600 reads/sec (DC1 CL.LOCAL_ONE). At the same time the load average is about 20-30 on all DC1 nodes(8 cores CPU - 32 GB RAM). C* starts timing out connections. Still in this scenario OpsCenter has some issues as well. Opscenter resets all Graphs layout and backs to the default layout on every refresh. It doesn't back to normal after the load decrease. I only managed to put OpsCenter to it's normal behavior after reinstalling it. Just for reference, we are using SATA HDDs on all nodes and running hdparm to check disk performance under this load, some nodes are reporting very low read rates (under 10 MB/sec), while others above 100 MB/sec. Under low load average this rate is above 250 MB/sec. 2) Repair takes at least 4-5 days to complete. Last repair was 20 days ago. Running repair under high loads is bringing some nodes down with the exception: "JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Java heap space" Any hints? Regards, Roni Balthazar