Thanks for your reply! There are 3 column families, they are created by kairosdb, one column family takes almost all the workload. I didn't tune the heap size, so by default it'll be 3GB. I have monitored the cpu and memory usage, the cpu usage is about 30% in average, and the available memory is about 1.5G in average, so the memory usage is about 87% in average. My workload is generated by test programs, it's stable and periodic, before the test programs receive error messages, there are no signs of high cpu usage or memory usage changes. That makes me confused.
2017-07-10 17:30 GMT+08:00 Varun Barala <varunbaral...@gmail.com>: > Hi, > > > *How many column families are there? What is the heap size?* > > You can turn off logs for statusLogger.java and gc to optimize heap usage. > > Can you also monitor cpu usage and memory usage? IMO, in your case memory > is the bottle-neck. > > Thanks!! > > On Mon, Jul 10, 2017 at 5:07 PM, 张强 <tzx...@gmail.com> wrote: > >> Hi experts, I've a single cassandra 3.11.0 node working with kairosdb (a >> time series database), after running 4 days with stable workload, the >> database client start to get "request errors", but there are not a lot of >> error or warning messages in the cassandra log file, the client start to >> receive error message at about 7-7 21:03:00, and kairosdb keep retrying >> after that time, but there isn't much logs in the cassandra log file. >> I've noticed the abnormal status at about 7-8 16:00:00, then I've typed a >> "nodetool tablestats" command to get some information, the command got an >> error, and while that time, the cassandra process start to crash, and >> generated a dump file. >> After C* shutdown, I take the logs to see what happened, and I found >> something strange inside the logs. >> >> 1. In the system.log, there are two lines shows that no logs between >> 2017-07-07 21:03:50 to 2017-07-08 16:07:33, I think that is a pretty long >> period without any logs, and in gc.log file, there are a lot of logs shows >> long time GC, that should be logged in system.log. >> INFO [ReadStage-1] 2017-07-07 21:03:50,824 NoSpamLogger.java:91 - >> Maximum memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB >> WARN [PERIODIC-COMMIT-LOG-SYNCER] 2017-07-08 16:07:33,347 >> NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with >> average duration of 60367.73ms, 1 have exceeded the configured commit >> interval by an average of 50367.73ms >> >> 2. In the system.log, there is a log shows very long time GC, and then >> the C* start to close. >> WARN [ScheduledTasks:1] 2017-07-08 16:07:46,846 NoSpamLogger.java:94 - >> Some operations timed out, details available at debug level (debug.log) >> WARN [Service Thread] 2017-07-08 16:10:36,114 GCInspector.java:282 - >> ConcurrentMarkSweep GC in 688850ms. CMS Old Gen: 2114938312 -> 469583832; >> Par Eden Space: 837584 -> 305319752; Par Survivor Space: 41943040 -> >> 25784008 >> ...... >> ERROR [Thrift:22] 2017-07-08 16:10:56,322 CassandraDaemon.java:228 - >> Exception in thread Thread[Thrift:22,5,main] >> java.lang.OutOfMemoryError: Java heap space >> >> 3. In the debug.log, the last INFO level log is at 2017-07-07 14:43:59, >> the log is: >> INFO [IndexSummaryManager:1] 2017-07-07 14:43:59,967 >> IndexSummaryRedistribution.java:75 - Redistributing index summaries >> After that, there are DEBUG level logs until 2017-07-07 21:11:34, but no >> more INFO level or other level logs in that log file, while there are still >> many logs in the system.log after 2017-07-07 14:43:59. Why doesn't these >> two log files match? >> >> My hardware is 4 core cpu and 12G ram, and I'm using windows server 2012 >> r2. >> Do you know what's going on here? and is there anything I can do to >> prevent that situation? >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> > >