Maybe the disk I/O cannot keep up with the high mutation rate ? Check the number of pending compactions
On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester <onmstes...@zoho.com> wrote: > Hi, > > I was doing 500K inserts + 100K counter update in seconds on my cluster of > 12 nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements > with no problem. > I saw a lot of warning show that most of batches not concerning a single > node, so they should not be in a batch, on the other hand input load of my > application > increased by 50%, so i switched to non-batch async inserts and increased > number of client threads so the load increased by 50%. > The system worked for 2 days with no problem with load of 750K inserts + > 150K counter updates per seconds but suddendly a lot of timeout on insert > generated in log files > Decreasing input load to previous load, even less than that did not help. > When i restart my client (after some hours that its been started log > timeouts and erros) it works with no problem for 20 minutes but again > starts logging timeout errors. > CPU load of nodes in cluster is less than 25%. > How can i solve this problem? I'm saving all jmx metrics of cassande\ra by > monitoring system, What should i check? > > Sent using Zoho Mail <https://www.zoho.com/mail/> > > >