Nevermind, we resolved the issue.... JVM heap settings were misconfigured Martin
On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m.mac...@gmail.com> wrote: > Hi all, > > We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded > it to 3.11.2. > > Each node has 32 GB RAM, 8 GB Cassandra heap size. > > After the upgrade, clients started reporting connection issues: > > cassandra | [ERROR] Closing established connection pool to host > <redacted> because of the following error: Read error 'connection > reset by peer' (src/pool.cpp:384) > cassandra | [ERROR] Unable to establish a control connection to host > <redacted> because of the following error: Error: 'Request timed out' > (0x0100000E) (src/control_connection.cpp:263) > > > Cassandra logs are full of garbage collection warnings: > > WARN [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 - > ConcurrentMarkSweep GC in 7858ms. Par Eden Space: 6871908352 -> > 1774446288; Par Survivor Space: 858980344 -> 0 > INFO [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 - > Pool Name Active Pending Completed Blocked > All Time Blocked > INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - > MutationStage 10 9 2526002 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - > ViewMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - > ReadStage 2 2 943544 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - > RequestResponseStage 0 0 1666876 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > ReadRepairStage 0 0 10362 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > CounterMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > MiscStage 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > CompactionExecutor 0 0 3076 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - > MemtableReclaimMemory 0 0 44 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - > PendingRangeCalculator 0 0 4 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - > GossipStage 0 0 14287 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - > SecondaryIndexManagement 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - > HintsDispatcher 0 0 1 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_1 0 0 37 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_2 0 0 37 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - > MigrationStage 0 0 2 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - > MemtablePostFlush 0 0 72 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_0 0 0 44 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - > ValidationExecutor 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - > Sampler 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - > MemtableFlushWriter 0 0 44 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_5 0 0 37 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - > InternalResponseStage 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_3 0 0 37 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 - > PerDiskMemtableFlushWriter_4 0 0 37 > 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - > AntiEntropyStage 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - > CacheCleanupExecutor 0 0 0 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - > Native-Transport-Requests 67 0 3627740 0 > 625945 > INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - > UserDefinedFunctions 2 36 295998373 0 > 0 > INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 - > CompactionManager 0 0 > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 - > MessagingService n/a 1/1 > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 - > Cache Type Size Capacity > KeysToSave > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 - > KeyCache 9066584 104857600 > all > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 - > RowCache 0 0 > all > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 - > Table Memtable ops,data > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - > system_distributed.parent_repair_history 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - > system_distributed.repair_history 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - > system_distributed.view_build_status 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - > system.compaction_history 18,5064 > INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - > system.schema_aggregates 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.schema_triggers 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.size_estimates 27018,940475 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.paxos 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.views_builds_in_progress 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.batches 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.schema_keyspaces 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.sstable_activity 118152,2241249 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.batchlog 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.schema_columns 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.hints 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - > system.IndexInfo 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.schema_columnfamilies 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.schema_functions 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.built_views 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.peer_events 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.range_xfers 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.peers 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.transferred_ranges 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.schema_usertypes 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.local 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.available_ranges 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - > system.prepared_statements 67,13574 > INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - > system_schema.columns 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - > system_schema.types 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - > system_schema.indexes 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - > system_schema.keyspaces 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - > system_schema.dropped_columns 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - > system_schema.aggregates 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - > system_schema.triggers 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - > system_schema.tables 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - > system_schema.views 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - > system_schema.functions 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - > <redacted>.event_postmortem 2855,140051 > INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - > <redacted>.group_meta 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - > <redacted>.group_postmortem 5715,381267 > INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - > <redacted>.event_source 49336,24272521 > INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - > <redacted>.event_alert 1592,78328 > INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - > <redacted>.event_group 59896,29310038 > INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - > <redacted>.group_alert 3185,212161 > INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - > <redacted>.event_child 17319,9226867 > INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - > <redacted>.event_output 123016,242316519 > INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - > <redacted>.event 137939,66815911 > INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - > <redacted>.event_meta 1,57 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > <redacted>.postmortem 4929,4353986 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > <redacted>.audit_type 8003,2624883 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > <redacted>.audit 8005,2934624 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > system_auth.roles 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > system_auth.role_members 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > system_auth.resource_role_permissons_index 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > system_auth.role_permissions 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > system_traces.sessions 0,0 > INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - > system_traces.events 0,0 > > Also, there are many warnings about UDFs executing too slow: > > WARN [Native-Transport-Requests-80] 2018-03-23 05:08:41,327 > UDFunction.java:436 - User defined function asm_log.ttime_state : > (frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint, > bigint>> ran longer than 500ms > > And sometimes the node crashes with the following: > ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027 > JVMStabilityInspector.java:162 - JVM state determined to be unstable. > Exiting forcefully due to: > java.util.concurrent.TimeoutException: User defined function > <redacted>.checks_state : (frozen<tuple<bigint>>, int, int) -> > frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM > > There is no substantial increase in load average. > > We increased heap size to 12 GB, that did not help. Is there anything > else we can try, besides downgrading? > > > > Thanks, > > Martin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org