Hi all, We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded it to 3.11.2.
Each node has 32 GB RAM, 8 GB Cassandra heap size. After the upgrade, clients started reporting connection issues: cassandra | [ERROR] Closing established connection pool to host <redacted> because of the following error: Read error 'connection reset by peer' (src/pool.cpp:384) cassandra | [ERROR] Unable to establish a control connection to host <redacted> because of the following error: Error: 'Request timed out' (0x0100000E) (src/control_connection.cpp:263) Cassandra logs are full of garbage collection warnings: WARN [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 - ConcurrentMarkSweep GC in 7858ms. Par Eden Space: 6871908352 -> 1774446288; Par Survivor Space: 858980344 -> 0 INFO [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 - Pool Name Active Pending Completed Blocked All Time Blocked INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - MutationStage 10 9 2526002 0 0 INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - ViewMutationStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - ReadStage 2 2 943544 0 0 INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - RequestResponseStage 0 0 1666876 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - ReadRepairStage 0 0 10362 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - CounterMutationStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - MiscStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - CompactionExecutor 0 0 3076 0 0 INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - MemtableReclaimMemory 0 0 44 0 0 INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - PendingRangeCalculator 0 0 4 0 0 INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - GossipStage 0 0 14287 0 0 INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - SecondaryIndexManagement 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - HintsDispatcher 0 0 1 0 0 INFO [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 - PerDiskMemtableFlushWriter_1 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - PerDiskMemtableFlushWriter_2 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - MigrationStage 0 0 2 0 0 INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - MemtablePostFlush 0 0 72 0 0 INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 44 0 0 INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - ValidationExecutor 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - Sampler 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - MemtableFlushWriter 0 0 44 0 0 INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - PerDiskMemtableFlushWriter_5 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - InternalResponseStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 - PerDiskMemtableFlushWriter_3 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 - PerDiskMemtableFlushWriter_4 0 0 37 0 0 INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - AntiEntropyStage 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - CacheCleanupExecutor 0 0 0 0 0 INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - Native-Transport-Requests 67 0 3627740 0 625945 INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - UserDefinedFunctions 2 36 295998373 0 0 INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 - CompactionManager 0 0 INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 - MessagingService n/a 1/1 INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 - Cache Type Size Capacity KeysToSave INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 - KeyCache 9066584 104857600 all INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 - RowCache 0 0 all INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 - Table Memtable ops,data INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - system_distributed.parent_repair_history 0,0 INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - system_distributed.repair_history 0,0 INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - system_distributed.view_build_status 0,0 INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - system.compaction_history 18,5064 INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - system.schema_aggregates 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.schema_triggers 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.size_estimates 27018,940475 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.paxos 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.views_builds_in_progress 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.batches 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.schema_keyspaces 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.sstable_activity 118152,2241249 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.batchlog 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.schema_columns 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.hints 0,0 INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - system.IndexInfo 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.schema_columnfamilies 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.schema_functions 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.built_views 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.peer_events 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.range_xfers 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.peers 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.transferred_ranges 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.schema_usertypes 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.local 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.available_ranges 0,0 INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - system.prepared_statements 67,13574 INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - system_schema.columns 0,0 INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - system_schema.types 0,0 INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - system_schema.indexes 0,0 INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - system_schema.keyspaces 0,0 INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - system_schema.dropped_columns 0,0 INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - system_schema.aggregates 0,0 INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - system_schema.triggers 0,0 INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - system_schema.tables 0,0 INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - system_schema.views 0,0 INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - system_schema.functions 0,0 INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - <redacted>.event_postmortem 2855,140051 INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - <redacted>.group_meta 0,0 INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - <redacted>.group_postmortem 5715,381267 INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - <redacted>.event_source 49336,24272521 INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - <redacted>.event_alert 1592,78328 INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - <redacted>.event_group 59896,29310038 INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - <redacted>.group_alert 3185,212161 INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - <redacted>.event_child 17319,9226867 INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - <redacted>.event_output 123016,242316519 INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - <redacted>.event 137939,66815911 INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - <redacted>.event_meta 1,57 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - <redacted>.postmortem 4929,4353986 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - <redacted>.audit_type 8003,2624883 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - <redacted>.audit 8005,2934624 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - system_auth.roles 0,0 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - system_auth.role_members 0,0 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - system_auth.resource_role_permissons_index 0,0 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - system_auth.role_permissions 0,0 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - system_traces.sessions 0,0 INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - system_traces.events 0,0 Also, there are many warnings about UDFs executing too slow: WARN [Native-Transport-Requests-80] 2018-03-23 05:08:41,327 UDFunction.java:436 - User defined function asm_log.ttime_state : (frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint, bigint>> ran longer than 500ms And sometimes the node crashes with the following: ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027 JVMStabilityInspector.java:162 - JVM state determined to be unstable. Exiting forcefully due to: java.util.concurrent.TimeoutException: User defined function <redacted>.checks_state : (frozen<tuple<bigint>>, int, int) -> frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM There is no substantial increase in load average. We increased heap size to 12 GB, that did not help. Is there anything else we can try, besides downgrading? Thanks, Martin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org