Thanks for the response. We are on version 1.1.2. We don't see the MutationStage back up. The dump from the messages dropped error doesn't show a backup, but also watching "nodetool tpstats" doesn't show any backup there.
nodetool info also shows we have over a gig of available memory on the JVM heap of each node. The earliest GCInspector traces I see before one of the more recent incidents in which messages were dropped are: INFO [ScheduledTasks:1] 2012-09-18 02:25:53,928 GCInspector.java (line 122) GC for ParNew: 396 ms for 1 collections, 2064505088 used; max is 4253024256 NFO [ScheduledTasks:1] 2012-09-18 02:25:55,929 GCInspector.java (line 122) GC for ParNew: 485 ms for 1 collections, 1961875064 used; max is 4253024256 INFO [ScheduledTasks:1] 2012-09-18 02:25:57,930 GCInspector.java (line 122) GC for ParNew: 265 ms for 1 collections, 1968074096 used; max is 4253024256 But this was 45 minutes before messages were dropped. It's appreciated, -Mike On Sep 17, 2012, at 11:27 PM, aaron morton wrote: >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) >> MemtablePostFlusher 1 5 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) >> FlushWriter 1 5 0 > Looks suspiciously like > http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3c9fb0e801-b1ed-41c4-9939-bafbddf15...@thelastpickle.com%3E > > What version are you on ? > > Are there any ERROR log messages before this ? > > Are you seeing MutationStage back up ? > > Are you see log messages from GCInspector ? > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 18/09/2012, at 2:16 AM, Michael Theroux <mthero...@yahoo.com> wrote: > >> Hello, >> >> While under load, we have occasionally been seeing "messages dropped" errors >> in our cassandra log. Doing some research, I understand this is part of >> Cassandra's design to shed load, and we should look at the tpstats-like >> output to determine what should be done to resolve the situation. >> Typically, you will see lots of messages blocked or pending, and that might >> be an indicator that a specific part of hardware needs to be >> improved/tuned/upgraded. >> >> However, looking at the output we are getting, I'm finding it difficult to >> see what needs to be tuned, as it looks to me cassandra is handling the load >> within the mutation stage: >> >> NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line >> 658) 119 MUTATION messages dropped in last 5000ms >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) >> Pool Name Active Pending Blocked >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java (line 72) >> ReadStage 3 3 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) >> RequestResponseStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) >> ReadRepairStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) >> MutationStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) >> ReplicateOnWriteStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) >> GossipStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) >> AntiEntropyStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) >> MigrationStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) >> StreamStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) >> MemtablePostFlusher 1 5 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) >> FlushWriter 1 5 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) >> MiscStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) >> commitlog_archiver 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) >> InternalResponseStage 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) >> AntiEntropySessions 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) >> HintedHandoff 0 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) >> CompactionManager 0 0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) >> MessagingService n/a 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) >> Cache Type Size Capacity >> KeysToSave Provider >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 100) >> KeyCache 2184533 2184533 >> all >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 106) >> RowCache 0 0 >> all org.apache.cassandra.cache.SerializingCacheProvider >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 113) >> ColumnFamily Memtable ops,data >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 116) >> system.NodeIdInfo 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) >> system.IndexInfo 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) >> system.LocationInfo 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) >> system.Versions 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) >> system.schema_keyspaces 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) >> system.Migrations 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java (line 116) >> system.schema_columnfamilies 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line 116) >> system.schema_columns 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line 116) >> system.HintsColumnFamily 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line 116) >> system.Schema 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 116) >> prod.comp 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 116) >> prod.byp 224,61863 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 116) >> prod.byn 30,6106 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 116) >> prod.p 3354,3792954 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line 116) >> prod.rel 788,153294 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line 116) >> prod.i 46,271517 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 116) >> prod.users 18489,5650405 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 116) >> prod.caches 40,7512998 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 116) >> prod.sessions 8085,13063121 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 116) >> prod.content 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java (line 116) >> prod.enr 436,341190 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java (line 116) >> prod.byt 23800,9880339 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 116) >> prod.aliases 63603,10475769 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 116) >> prod.ran 1500,329387 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 116) >> prod.cmap 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 116) >> prod.pmap 73032,61237450 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line 116) >> prod.pic 5421,4438066 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line 116) >> prod.watchdog 0,0 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java (line 116) >> prod.s 31,40924 >> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java (line 116) >> prod.m 21690,64262284 >> >> >> We are using a write consistency level of LOCAL_QUROUM over a six node >> cluster. Any observations are welcome, >> >> Thanks, >> -Mike >