Hi, We have an application that reads data from a set of external sources and loads them into our cassandra cluster. The load goes ok for some time (~24h) and then some servers in the cluster starts flapping between being down and up, and finally they go out of memory. The cluster consists of 5 m4.xlarge machines with 16gb memory, cassandra has an 8gb heap. All machines have a high load while data is being written, with a load between 6 and 20.
I have tried sifting through the information available from nodetool, but I am unable to find anything helping me determine what is causing the oom. I am quite new to cassandra, so I might very well overlook the obvious. So any pointers on how to proceed with identifying the problem will be much appriciated :) In the following I have included information from 10.61.70.110 when it was flapping. Status for ddp keyspace(only keyspace containing any real data): ----------------------------------------------------------------------------- Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.61.70.108 65.97 GB 256 59,1% de79a554-9296-4575-8b79-2089f92069cd rack1 UN 10.61.70.110 58.95 GB 256 63,3% 310460f6-b7ce-45a7-be63-a7dd409f6b17 rack1 UN 10.61.70.72 58.17 GB 256 60,3% 44fd4f8e-18cd-4487-8174-3a22fb9ed24f rack1 UN 10.61.70.107 58.69 GB 256 58,5% f8118fc2-e340-45db-a06e-a5842107d6c8 rack1 UN 10.61.70.64 68 GB 256 58,7% 84bee9fe-2adc-48aa-915c-f43d972f5a2f rack1 ----------------------------------------------------------------------------- Snippet from system.log: ----------------------------------------------------------------------------- INFO [ScheduledTasks:1] 2016-04-12 17:35:35,547 MessagingService.java:980 - MUTATION messages were dropped in last 5000 ms: 5776 for internal timeout and 0 for cross node timeout INFO [ScheduledTasks:1] 2016-04-12 17:35:35,547 StatusLogger.java:52 - Pool Name Active Pending Completed Blocked All Time Blocked INFO [ScheduledTasks:1] 2016-04-12 17:35:35,551 StatusLogger.java:56 - MutationStage 32 4881705 17826061870 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,552 StatusLogger.java:56 - ViewMutationStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,552 StatusLogger.java:56 - ReadStage 0 0 3266887 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,553 StatusLogger.java:56 - RequestResponseStage 0 0 389429305 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,553 StatusLogger.java:56 - ReadRepairStage 0 0 322804 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 - CounterMutationStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 - MiscStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,554 StatusLogger.java:56 - CompactionExecutor 4 54 31305 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 - MemtableReclaimMemory 0 0 3310 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 - PendingRangeCalculator 0 0 10 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 - GossipStage 0 0 338170 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,555 StatusLogger.java:56 - SecondaryIndexManagement 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,556 StatusLogger.java:56 - HintsDispatcher 1 4 6264 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,558 StatusLogger.java:56 - MigrationStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,559 StatusLogger.java:56 - MemtablePostFlush 0 0 3451 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,560 StatusLogger.java:56 - ValidationExecutor 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,561 StatusLogger.java:56 - Sampler 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,563 StatusLogger.java:56 - MemtableFlushWriter 0 0 3310 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,564 StatusLogger.java:56 - InternalResponseStage 0 0 1873184 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,565 StatusLogger.java:56 - AntiEntropyStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,565 StatusLogger.java:56 - CacheCleanupExecutor 0 0 0 0 0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,566 StatusLogger.java:56 - Native-Transport-Requests 3 2 212872837 0 1796 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,566 StatusLogger.java:66 - CompactionManager 4 32 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,566 StatusLogger.java:78 - MessagingService n/a 0/0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,567 StatusLogger.java:88 - Cache Type Size Capacity KeysToSave INFO [ScheduledTasks:1] 2016-04-12 17:35:35,568 StatusLogger.java:90 - KeyCache 88654200 104857600 all INFO [ScheduledTasks:1] 2016-04-12 17:35:35,568 StatusLogger.java:96 - RowCache 0 0 all INFO [ScheduledTasks:1] 2016-04-12 17:35:35,568 StatusLogger.java:103 - Table Memtable ops,data INFO [ScheduledTasks:1] 2016-04-12 17:35:35,569 StatusLogger.java:106 - ddp.fingerprint_by_content_uuid_mv 99624,17726363 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,569 StatusLogger.java:106 - ddp.sync 4747,1551 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,569 StatusLogger.java:106 - ddp.log_portal 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,570 StatusLogger.java:106 - ddp.meta_data 2083,3829424 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,570 StatusLogger.java:106 - ddp.configuration 199,13915 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,570 StatusLogger.java:106 - ddp.uuids_by_related_uuid 414431,17721698 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,570 StatusLogger.java:106 - ddp.file_by_file_id 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,571 StatusLogger.java:106 - ddp.fingerprint_by_content_type_mv 83007,3317002 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,571 StatusLogger.java:106 - ddp.heartbeat 35855,13407 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,574 StatusLogger.java:106 - ddp.concept 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,575 StatusLogger.java:106 - ddp.classification_scheme 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,576 StatusLogger.java:106 - ddp.rendering_relations_mv 9665,1201123 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,576 StatusLogger.java:106 - ddp.file 41,19826 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,576 StatusLogger.java:106 - ddp.rendering_relations 276049,13465909 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,576 StatusLogger.java:106 - ddp.semantic_group 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,577 StatusLogger.java:106 - ddp.rendering 23344,68861340 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,577 StatusLogger.java:106 - ddp.thesauri 44,8 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,577 StatusLogger.java:106 - ddp.file_download 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,577 StatusLogger.java:106 - ddp.uuids_by_related_uuid_mv 172394,14524712 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,578 StatusLogger.java:106 - ddp.fingerprint 124884,54955730 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,578 StatusLogger.java:106 - ddp.ddp_status 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,578 StatusLogger.java:106 - ddp.log_portal_mv 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,578 StatusLogger.java:106 - system_distributed.parent_repair_history 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,579 StatusLogger.java:106 - system_distributed.repair_history 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,579 StatusLogger.java:106 - system.compaction_history 20,4595 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,580 StatusLogger.java:106 - system.hints 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,580 StatusLogger.java:106 - system.schema_aggregates 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,580 StatusLogger.java:106 - system.IndexInfo 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,581 StatusLogger.java:106 - system.schema_columnfamilies 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,581 StatusLogger.java:106 - system.schema_triggers 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,582 StatusLogger.java:106 - system.size_estimates 50400,764904 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,582 StatusLogger.java:106 - system.schema_functions 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,582 StatusLogger.java:106 - system.paxos 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,582 StatusLogger.java:106 - system.views_builds_in_progress 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,583 StatusLogger.java:106 - system.built_views 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,583 StatusLogger.java:106 - system.peer_events 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,583 StatusLogger.java:106 - system.range_xfers 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,583 StatusLogger.java:106 - system.peers 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,584 StatusLogger.java:106 - system.batches 188441,33975764 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,584 StatusLogger.java:106 - system.schema_keyspaces 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,585 StatusLogger.java:106 - system.schema_usertypes 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,585 StatusLogger.java:106 - system.local 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,585 StatusLogger.java:106 - system.sstable_activity 1376,24807 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,587 StatusLogger.java:106 - system.available_ranges 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,587 StatusLogger.java:106 - system.batchlog 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,588 StatusLogger.java:106 - system.schema_columns 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,589 StatusLogger.java:106 - system_schema.columns 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,589 StatusLogger.java:106 - system_schema.types 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,590 StatusLogger.java:106 - system_schema.indexes 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,590 StatusLogger.java:106 - system_schema.keyspaces 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,590 StatusLogger.java:106 - system_schema.dropped_columns 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,592 StatusLogger.java:106 - system_schema.aggregates 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,592 StatusLogger.java:106 - system_schema.triggers 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,592 StatusLogger.java:106 - system_schema.tables 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,592 StatusLogger.java:106 - system_schema.views 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,593 StatusLogger.java:106 - system_schema.functions 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,593 StatusLogger.java:106 - system_auth.roles 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,593 StatusLogger.java:106 - system_auth.role_members 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,593 StatusLogger.java:106 - system_auth.resource_role_permissons_index 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,594 StatusLogger.java:106 - system_auth.role_permissions 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,594 StatusLogger.java:106 - system_traces.sessions 0,0 INFO [ScheduledTasks:1] 2016-04-12 17:35:35,594 StatusLogger.java:106 - system_traces.events 0,0 ----------------------------------------------------------------------------- compactionstats: ----------------------------------------------------------------------------- pending tasks: 32 id compaction type keyspace table completed total unit progress 3bf89310-00d1-11e6-a5a3-f125ce747d55 Compaction ddp rendering_relations 1,09 GB 5,43 GB bytes 20,10% 9c5bd6b1-00d4-11e6-a5a3-f125ce747d55 Compaction ddp meta_data 243,64 MB 338,56 MB bytes 71,96% 50308cb0-00d2-11e6-a5a3-f125ce747d55 Compaction ddp uuids_by_related_uuid 1,37 GB 2,17 GB bytes 63,11% e8965d90-00c3-11e6-a5a3-f125ce747d55 Compaction ddp rendering 19,02 GB 24,09 GB bytes 78,96% ----------------------------------------------------------------------------- Thank you in advance :) Yours sincerely, Bo Madsen