I started digging in the logfiles, and here's the full log of a crash. We have 4GB of heap, and if you watch either OpsCenter or through a JMX console, used heap size is always below 3GB, and it swings between 1.5 to 3GB ever 5 minutes or so. So why would it suddenly run out of heap space?
INFO 14:08:50,078 Completed flushing /data-ssd/cassandra/data/system/HintsColumnFamily-hd-10-Data.db (5328100 bytes) INFO 14:08:50,079 4103 MUTATION messages dropped in last 5000ms INFO 14:08:50,080 Pool Name Active Pending Blocked INFO 14:08:50,086 Writing Memtable-A@1734568216(117729674/147162092 serialized/live bytes, 12097 ops) INFO 14:08:50,089 ReadStage 32 363 0 INFO 14:08:50,089 RequestResponseStage 0 0 0 INFO 14:08:50,089 ReadRepairStage 1 1 0 INFO 14:08:50,089 MutationStage 32 5672 0 INFO 14:08:50,090 ReplicateOnWriteStage 0 0 0 INFO 14:08:50,090 GossipStage 0 0 0 INFO 14:08:50,090 AntiEntropyStage 0 0 0 INFO 14:08:50,090 MigrationStage 0 0 0 INFO 14:08:50,090 StreamStage 0 0 0 INFO 14:08:50,091 MemtablePostFlusher 1 6 0 INFO 14:08:50,091 FlushWriter 1 5 0 INFO 14:08:50,091 MiscStage 0 0 0 INFO 14:08:50,091 InternalResponseStage 0 0 0 INFO 14:08:50,094 Compacting [SSTableReader(path='/data-ssd/cassandra/data/system/HintsColumnFamily-hd-9-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/system/HintsColumnFamily-hd-10-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/system/HintsColumnFamily-hd-8-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/system/HintsColumnFamily-hd-7-Data.db')] INFO 14:08:50,098 HintedHandoff 0 0 0 INFO 14:08:50,098 CompactionManager 0 2 INFO 14:08:50,099 MessagingService n/a 0,1 INFO 14:08:50,099 ColumnFamily Memtable ops,data Row cache size/cap Key cache size/cap INFO 14:08:50,099 system.NodeIdInfo 0,0 0/0 0/0 INFO 14:08:50,099 system.IndexInfo 0,0 0/0 0/0 INFO 14:08:50,099 system.LocationInfo 0,0 0/0 0/0 INFO 14:08:50,100 system.Versions 0,0 0/0 0/0 INFO 14:08:50,107 system.Migrations 0,0 0/0 0/0 INFO 14:08:50,107 system.HintsColumnFamily 0,0 0/0 0/0 INFO 14:08:50,107 system.Schema 0,0 0/0 0/0 INFO 14:08:50,108 K.A 920,12311092 0/0 0/0 INFO 14:08:50,127 K.B 111,1005040 0/0 6000/6000 INFO 14:08:50,127 K.C 13,23283 0/0 211817/211817 INFO 14:08:50,127 OpsCenter.rollups300 0,0 0/0 4/4 INFO 14:08:50,128 OpsCenter.rollups60 1876,1452889 0/0 30/30 INFO 14:08:50,128 OpsCenter.rollups7200 0,0 0/0 30/30 INFO 14:08:50,137 Enqueuing flush of Memtable-rollups60@33961585(43148/1452889 serialized/live bytes, 1876 ops) INFO 14:08:50,160 OpsCenter.rollups86400 0,0 0/0 0/0 INFO 14:08:50,169 OpsCenter.events 0,0 0/0 13/13 INFO 14:08:50,174 OpsCenter.events_timeline 0,0 0/0 0/0 INFO 14:08:50,175 OpsCenter.settings 0,0 0/0 0/0 INFO 14:08:50,175 OpsCenter.pdps 470,103983 0/0 0/0 INFO 14:09:09,860 GC for ConcurrentMarkSweep: 2560 ms for 6 collections, 3669729184 used; max is 4118806528 WARN 14:09:09,860 Heap is 0.8909690608317886 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically WARN 14:09:09,860 Flushing CFS(Keyspace='K', ColumnFamily='A') to relieve memory pressure java.lang.OutOfMemoryError: Java heap space Dumping heap to /var/lib/cassandra/java_1346605112.hprof ... Heap dump file created [5553296481 bytes in 67.950 secs] INFO 14:11:54,810 InetAddress /10.100.1.116 is now dead. INFO 14:11:54,813 InetAddress /10.100.1.117 is now dead. INFO 14:11:54,814 InetAddress /10.100.1.118 is now dead. ERROR 14:11:54,830 Fatal exception in thread Thread[CompactionExecutor:565,1,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.FastByteArrayOutputStream.expand(FastByteArrayOutputStream.java:104) at org.apache.cassandra.io.util.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:220) at java.io.DataOutputStream.write(Unknown Source) at org.apache.cassandra.utils.ByteBufferUtil.write(ByteBufferUtil.java:323) at org.apache.cassandra.utils.ByteBufferUtil.writeWithLength(ByteBufferUtil.java:316) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:63) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:361) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:334) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:89) at org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:136) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:158) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:135) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:115) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO 14:11:54,839 InetAddress /10.100.1.116 is now UP INFO 14:11:54,849 Stop listening to thrift clients INFO 14:11:54,850 InetAddress /10.100.1.117 is now UP INFO 14:11:54,850 InetAddress /10.100.1.118 is now UP INFO 14:11:54,862 Waiting for messaging service to quiesce INFO 14:12:06,909 MessagingService shutting down server thread. ERROR 14:12:06,912 Fatal exception in thread Thread[Thread-117,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:157) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115) ERROR 14:12:06,913 Fatal exception in thread Thread[Thread-116,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:157) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115) ERROR 14:12:19,466 Fatal exception in thread Thread[Thread-118,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:157) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115) INFO 14:12:19,466 Started hinted handoff for token: 1 with IP: / 10.100.1.116 INFO 14:13:33,343 Completed flushing /data-ssd/cassandra/data/K/A-hd-279743-Data.db (43508009 bytes) INFO 14:13:33,346 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='rollups7200') (estimated 0 bytes) INFO 14:13:33,350 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='rollups86400') (estimated 0 bytes) INFO 14:13:33,351 Writing Memtable-B@797498287(674046/53923680 serialized/live bytes, 1388162 ops) INFO 14:13:33,349 Compacting [SSTableReader(path='/data-ssd/cassandra/data/K/A-hd-279741-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/K/A-hd-279235-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/K/A-hd-279743-Data.db'), SSTableReader(path='/data-ssd/cassandra/data/K/A-hd-279506-Data.db')] INFO 14:13:33,351 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='events') (estimated 0 bytes) INFO 14:13:33,350 Enqueuing flush of Memtable-A@884602824(10126517/12658146 serialized/live bytes, 948 ops) INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='events_timeline') (estimated 0 bytes) INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='settings') (estimated 0 bytes) INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='pdps') (estimated 103983 bytes) INFO 14:14:50,122 Completed flushing /data-ssd/cassandra/data/K/B-hd-14662-Data.db (5340050 bytes) INFO 14:14:50,123 26225 MUTATION messages dropped in last 5000ms INFO 14:14:50,123 Writing Memtable-C@841365631(436486/8776427 serialized/live bytes, 8752 ops) INFO 14:14:50,123 9 READ_REPAIR messages dropped in last 5000ms INFO 14:14:50,131 1519 READ messages dropped in last 5000ms INFO 14:14:50,146 Enqueuing flush of Memtable-pdps@1698224970(39228/103983 serialized/live bytes, 470 ops) INFO 14:14:50,167 Pool Name Active Pending Blocked INFO 14:14:50,169 ReadStage 20 20 0 INFO 14:14:50,169 RequestResponseStage 0 0 0 INFO 14:14:50,170 ReadRepairStage 1 1 0 INFO 14:14:50,170 ReplicateOnWriteStage 0 0 0 INFO 14:14:50,171 GossipStage 0 0 0 INFO 14:14:50,171 AntiEntropyStage 0 0 0 INFO 14:14:50,171 MigrationStage 0 0 0 INFO 14:14:50,172 StreamStage 0 0 0 INFO 14:14:50,172 MemtablePostFlusher 1 5 0 INFO 14:14:50,172 FlushWriter 1 5 1 INFO 14:14:50,173 MiscStage 0 0 0 INFO 14:14:50,173 InternalResponseStage 0 0 0 INFO 14:14:50,174 HintedHandoff 1 3 0 INFO 14:14:50,174 CompactionManager 1 4 INFO 14:14:50,174 MessagingService n/a 0,0 INFO 14:14:50,174 ColumnFamily Memtable ops,data Row cache size/cap Key cache size/cap INFO 14:14:50,194 system.NodeIdInfo 0,0 0/0 0/0 INFO 14:14:50,194 system.IndexInfo 0,0 0/0 0/0 INFO 14:14:50,194 system.LocationInfo 0,0 0/0 0/0 INFO 14:14:50,195 system.Versions 0,0 0/0 0/0 INFO 14:14:50,195 system.Migrations 0,0 0/0 0/0 INFO 14:14:50,195 system.HintsColumnFamily 13,964160 0/0 0/0 INFO 14:14:50,195 system.Schema 0,0 0/0 0/0 INFO 14:14:50,195 K.A 0,0 0/0 0/0 INFO 14:14:50,195 K.B 115,1037040 0/0 6000/6000 INFO 14:14:50,196 K.C 13,23283 0/0 211817/211817 INFO 14:14:50,196 OpsCenter.rollups300 0,0 0/0 4/4 INFO 14:14:50,196 OpsCenter.rollups60 0,0 0/0 30/30 INFO 14:14:50,196 OpsCenter.rollups7200 0,0 0/0 30/30 INFO 14:14:50,196 OpsCenter.rollups86400 0,0 0/0 0/0 INFO 14:14:50,196 OpsCenter.events 0,0 0/0 13/13 INFO 14:14:50,197 OpsCenter.events_timeline 0,0 0/0 0/0 INFO 14:14:50,197 OpsCenter.settings 0,0 0/0 0/0 INFO 14:14:50,197 OpsCenter.pdps 0,0 0/0 0/0 Service exit with a return value of 100 On Tue, Sep 11, 2012 at 3:57 AM, aaron morton <aa...@thelastpickle.com>wrote: > It's impossible to start new connections, or impossible to send requests, > or it just doesn't return anything when you've sent a request. > > If it's totally frozen it sounds like GC. How long does it freeze for? > > Despite that, we occasionally get OOM exceptions, and nodes crashing, > maybe a few times per month. > > Do you have an error stack ? > > We can't find anything in the cassandra logs indicating that something's up > > Is it logging dropped messages or high TP pending ? Are the freezes > associated with compaction or repair running? > > and occasionally we do bulk deletion of supercolumns in a row. > > mmm, are you sending a batch mutation with lots-o-deletes ? Each row > mutation (insert or delete) in the batch becomes a thread pool tasks. If > you send 1,000 rows in a batch you will temporarily prevent other requests > from being served. > > The config options we are unsure about are things like commit log sizes, …. > > I would try to find some indication of what's going on before tweaking. > Have you checked iostat ? > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 11/09/2012, at 2:05 AM, Henrik Schröder <skro...@gmail.com> wrote: > > Hi all, > > We're running a small Cassandra cluster (v1.0.10) serving data to our web > application, and as our traffic grows, we're starting to see some weird > issues. The biggest of these is that sometimes, a single node becomes > unresponsive. It's impossible to start new connections, or impossible to > send requests, or it just doesn't return anything when you've sent a > request. Our client library is set to retry on another server when this > happens, and what we see then is that the request is usually served > instantly. So it's not the case that some requests are very difficult, it's > that sometimes a node is just "busy", and we have no idea why or what it's > doing. > > We're using MRTG and Monit to monitor the servers, and in MRTG the average > CPU usage is around 5%, on our quad-core Xeon servers with SSDs. But > occassionally through Monit we can see that the 1-min load average goes > above 4, and this usually corresponds to the above issues. Is this common? > Does this happen to everyone else? And why the spikiness in load? We can't > find anything in the cassandra logs indicating that something's up (such as > a slow GC or compaction), and there's no corresponding traffic spike in the > application either. Should we just add more nodes if any single one gets > CPU spikes? > > Another explanation could also be that we've configured it wrong. We're > running pretty much default config. Each node has 16G of RAM, 4GB of heap, > no row-cache and an sizeable key-cache. Despite that, we occasionally get > OOM exceptions, and nodes crashing, maybe a few times per month. Should we > increase heap size? Or move to 1.1 and enable off-heap caching? > > We have quite a lot of traffic to the cluster. A single keyspace with two > column families, RF=3, compression is enabled, and we're running the > default size-tiered compaction. > Column family A has 60GB of actual data, each row has a single column, and > that column holds binary data that varies in size up to 500kB. When we > update a row, we write a new value to this single column, effectively > replacing that entire row. We do ~1000 updates/s, totalling ~10MB/s in > writes. > Column family B also has 60GB of actual data, but each row has a variable > (~100-10000) number of supercolumns, and each supercolumn has a fixed > number of columns with a fixed amount of data, totalling ~1kB. The > operations we are doing on this column family is that we add supercolumns > to rows with a rate of ~200/s, and occasionally we do bulk deletion of > supercolumns in a row. > > The config options we are unsure about are things like commit log sizes, > memtable flushing thresholds, commit log syncing, compaction throughput, > etc. Are we at a point with our data size and write load that the defaults > aren't good enough anymore? Should we stick with size-tiered compaction, > even though our application is update-heavy? > > > Many thanks, > /Henrik > > >