Martin, Would you pls share settings you had before and what did you change? We have similar issue.
> On Mar 23, 2018, at 8:47 AM, Martin Mačura <m.mac...@gmail.com> wrote: > > Nevermind, we resolved the issue.... JVM heap settings were misconfigured > > Martin > >> On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m.mac...@gmail.com> wrote: >> Hi all, >> >> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded >> it to 3.11.2. >> >> Each node has 32 GB RAM, 8 GB Cassandra heap size. >> >> After the upgrade, clients started reporting connection issues: >> >> cassandra | [ERROR] Closing established connection pool to host >> <redacted> because of the following error: Read error 'connection >> reset by peer' (src/pool.cpp:384) >> cassandra | [ERROR] Unable to establish a control connection to host >> <redacted> because of the following error: Error: 'Request timed out' >> (0x0100000E) (src/control_connection.cpp:263) >> >> >> Cassandra logs are full of garbage collection warnings: >> >> WARN [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 - >> ConcurrentMarkSweep GC in 7858ms. Par Eden Space: 6871908352 -> >> 1774446288; Par Survivor Space: 858980344 -> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 - >> Pool Name Active Pending Completed Blocked >> All Time Blocked >> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - >> MutationStage 10 9 2526002 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - >> ViewMutationStage 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - >> ReadStage 2 2 943544 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 - >> RequestResponseStage 0 0 1666876 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - >> ReadRepairStage 0 0 10362 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - >> CounterMutationStage 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - >> MiscStage 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - >> CompactionExecutor 0 0 3076 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 - >> MemtableReclaimMemory 0 0 44 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - >> PendingRangeCalculator 0 0 4 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - >> GossipStage 0 0 14287 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - >> SecondaryIndexManagement 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 - >> HintsDispatcher 0 0 1 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 - >> PerDiskMemtableFlushWriter_1 0 0 37 >> 0 0 >> INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - >> PerDiskMemtableFlushWriter_2 0 0 37 >> 0 0 >> INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 - >> MigrationStage 0 0 2 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - >> MemtablePostFlush 0 0 72 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - >> PerDiskMemtableFlushWriter_0 0 0 44 >> 0 0 >> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - >> ValidationExecutor 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 - >> Sampler 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - >> MemtableFlushWriter 0 0 44 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - >> PerDiskMemtableFlushWriter_5 0 0 37 >> 0 0 >> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 - >> InternalResponseStage 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 - >> PerDiskMemtableFlushWriter_3 0 0 37 >> 0 0 >> INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 - >> PerDiskMemtableFlushWriter_4 0 0 37 >> 0 0 >> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - >> AntiEntropyStage 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - >> CacheCleanupExecutor 0 0 0 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - >> Native-Transport-Requests 67 0 3627740 0 >> 625945 >> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 - >> UserDefinedFunctions 2 36 295998373 0 >> 0 >> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 - >> CompactionManager 0 0 >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 - >> MessagingService n/a 1/1 >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 - >> Cache Type Size Capacity >> KeysToSave >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 - >> KeyCache 9066584 104857600 >> all >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 - >> RowCache 0 0 >> all >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 - >> Table Memtable ops,data >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - >> system_distributed.parent_repair_history 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - >> system_distributed.repair_history 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - >> system_distributed.view_build_status 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - >> system.compaction_history 18,5064 >> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 - >> system.schema_aggregates 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.schema_triggers 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.size_estimates 27018,940475 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.paxos 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.views_builds_in_progress 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.batches 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.schema_keyspaces 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.sstable_activity 118152,2241249 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.batchlog 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.schema_columns 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.hints 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 - >> system.IndexInfo 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.schema_columnfamilies 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.schema_functions 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.built_views 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.peer_events 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.range_xfers 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.peers 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.transferred_ranges 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.schema_usertypes 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.local 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.available_ranges 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 - >> system.prepared_statements 67,13574 >> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - >> system_schema.columns 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - >> system_schema.types 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - >> system_schema.indexes 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - >> system_schema.keyspaces 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - >> system_schema.dropped_columns 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - >> system_schema.aggregates 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - >> system_schema.triggers 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 - >> system_schema.tables 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - >> system_schema.views 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - >> system_schema.functions 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - >> <redacted>.event_postmortem 2855,140051 >> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - >> <redacted>.group_meta 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 - >> <redacted>.group_postmortem 5715,381267 >> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - >> <redacted>.event_source 49336,24272521 >> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - >> <redacted>.event_alert 1592,78328 >> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - >> <redacted>.event_group 59896,29310038 >> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - >> <redacted>.group_alert 3185,212161 >> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - >> <redacted>.event_child 17319,9226867 >> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - >> <redacted>.event_output 123016,242316519 >> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - >> <redacted>.event 137939,66815911 >> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 - >> <redacted>.event_meta 1,57 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> <redacted>.postmortem 4929,4353986 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> <redacted>.audit_type 8003,2624883 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> <redacted>.audit 8005,2934624 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> system_auth.roles 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> system_auth.role_members 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> system_auth.resource_role_permissons_index 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> system_auth.role_permissions 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> system_traces.sessions 0,0 >> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 - >> system_traces.events 0,0 >> >> Also, there are many warnings about UDFs executing too slow: >> >> WARN [Native-Transport-Requests-80] 2018-03-23 05:08:41,327 >> UDFunction.java:436 - User defined function asm_log.ttime_state : >> (frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint, >> bigint>> ran longer than 500ms >> >> And sometimes the node crashes with the following: >> ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027 >> JVMStabilityInspector.java:162 - JVM state determined to be unstable. >> Exiting forcefully due to: >> java.util.concurrent.TimeoutException: User defined function >> <redacted>.checks_state : (frozen<tuple<bigint>>, int, int) -> >> frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM >> >> There is no substantial increase in load average. >> >> We increased heap size to 12 GB, that did not help. Is there anything >> else we can try, besides downgrading? >> >> >> >> Thanks, >> >> Martin > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org