Hi,
*How many column families are there? What is the heap size?* You can turn off logs for statusLogger.java and gc to optimize heap usage. Can you also monitor cpu usage and memory usage? IMO, in your case memory is the bottle-neck. Thanks!! On Mon, Jul 10, 2017 at 5:07 PM, 张强 <tzx...@gmail.com> wrote: > Hi experts, I've a single cassandra 3.11.0 node working with kairosdb (a > time series database), after running 4 days with stable workload, the > database client start to get "request errors", but there are not a lot of > error or warning messages in the cassandra log file, the client start to > receive error message at about 7-7 21:03:00, and kairosdb keep retrying > after that time, but there isn't much logs in the cassandra log file. > I've noticed the abnormal status at about 7-8 16:00:00, then I've typed a > "nodetool tablestats" command to get some information, the command got an > error, and while that time, the cassandra process start to crash, and > generated a dump file. > After C* shutdown, I take the logs to see what happened, and I found > something strange inside the logs. > > 1. In the system.log, there are two lines shows that no logs between > 2017-07-07 21:03:50 to 2017-07-08 16:07:33, I think that is a pretty long > period without any logs, and in gc.log file, there are a lot of logs shows > long time GC, that should be logged in system.log. > INFO [ReadStage-1] 2017-07-07 21:03:50,824 NoSpamLogger.java:91 - Maximum > memory usage reached (512.000MiB), cannot allocate chunk of 1.000MiB > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2017-07-08 16:07:33,347 > NoSpamLogger.java:94 - Out of 1 commit log syncs over the past 0.00s with > average duration of 60367.73ms, 1 have exceeded the configured commit > interval by an average of 50367.73ms > > 2. In the system.log, there is a log shows very long time GC, and then the > C* start to close. > WARN [ScheduledTasks:1] 2017-07-08 16:07:46,846 NoSpamLogger.java:94 - > Some operations timed out, details available at debug level (debug.log) > WARN [Service Thread] 2017-07-08 16:10:36,114 GCInspector.java:282 - > ConcurrentMarkSweep GC in 688850ms. CMS Old Gen: 2114938312 -> 469583832; > Par Eden Space: 837584 -> 305319752; Par Survivor Space: 41943040 -> > 25784008 > ...... > ERROR [Thrift:22] 2017-07-08 16:10:56,322 CassandraDaemon.java:228 - > Exception in thread Thread[Thrift:22,5,main] > java.lang.OutOfMemoryError: Java heap space > > 3. In the debug.log, the last INFO level log is at 2017-07-07 14:43:59, > the log is: > INFO [IndexSummaryManager:1] 2017-07-07 14:43:59,967 > IndexSummaryRedistribution.java:75 - Redistributing index summaries > After that, there are DEBUG level logs until 2017-07-07 21:11:34, but no > more INFO level or other level logs in that log file, while there are still > many logs in the system.log after 2017-07-07 14:43:59. Why doesn't these > two log files match? > > My hardware is 4 core cpu and 12G ram, and I'm using windows server 2012 > r2. > Do you know what's going on here? and is there anything I can do to > prevent that situation? > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org >