Martin,

Would you pls share settings you had before and what did you change? We have 
similar issue.



> On Mar 23, 2018, at 8:47 AM, Martin Mačura <m.mac...@gmail.com> wrote:
> 
> Nevermind, we resolved the issue....  JVM heap settings were misconfigured
> 
> Martin
> 
>> On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m.mac...@gmail.com> wrote:
>> Hi all,
>> 
>> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
>> it to 3.11.2.
>> 
>> Each node has 32 GB RAM, 8 GB Cassandra heap size.
>> 
>> After the upgrade, clients started reporting connection issues:
>> 
>> cassandra | [ERROR] Closing established connection pool to host
>> <redacted> because of the following error: Read error 'connection
>> reset by peer' (src/pool.cpp:384)
>> cassandra | [ERROR] Unable to establish a control connection to host
>> <redacted> because of the following error: Error: 'Request timed out'
>> (0x0100000E) (src/control_connection.cpp:263)
>> 
>> 
>> Cassandra logs are full of garbage collection warnings:
>> 
>> WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
>> ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
>> 1774446288; Par Survivor Space: 858980344 -> 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
>> Pool Name                    Active   Pending      Completed   Blocked
>> All Time Blocked
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> MutationStage                    10         9        2526002         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> ViewMutationStage                 0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> ReadStage                         2         2         943544         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> RequestResponseStage              0         0        1666876         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> ReadRepairStage                   0         0          10362         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> CounterMutationStage              0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> MiscStage                         0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> CompactionExecutor                0         0           3076         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> MemtableReclaimMemory             0         0             44         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> PendingRangeCalculator            0         0              4         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> GossipStage                       0         0          14287         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> SecondaryIndexManagement          0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> HintsDispatcher                   0         0              1         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_1         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_2         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
>> MigrationStage                    0         0              2         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> MemtablePostFlush                 0         0             72         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_0         0         0             44
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> ValidationExecutor                0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> Sampler                           0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> MemtableFlushWriter               0         0             44         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_5         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> InternalResponseStage             0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_3         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_4         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> AntiEntropyStage                  0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> CacheCleanupExecutor              0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> Native-Transport-Requests        67         0        3627740         0
>>           625945
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> UserDefinedFunctions              2        36      295998373         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 -
>> CompactionManager                 0         0
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 -
>> MessagingService                n/a       1/1
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 -
>> Cache Type                     Size                 Capacity
>>    KeysToSave
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 -
>> KeyCache                    9066584                104857600
>>           all
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 -
>> RowCache                          0                        0
>>           all
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 -
>> Table                       Memtable ops,data
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.parent_repair_history                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.repair_history                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.view_build_status                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system.compaction_history             18,5064
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system.schema_aggregates                  0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_triggers                    0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.size_estimates            27018,940475
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.paxos                              0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.views_builds_in_progress                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.batches                            0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_keyspaces                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.sstable_activity        118152,2241249
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.batchlog                           0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_columns                     0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.hints                              0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.IndexInfo                          0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_columnfamilies                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_functions                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.built_views                        0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.peer_events                        0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.range_xfers                        0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.peers                              0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.transferred_ranges                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_usertypes                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.local                              0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.available_ranges                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.prepared_statements            67,13574
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.columns                     0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.types                       0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.indexes                     0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.keyspaces                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.dropped_columns                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.aggregates                  0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.triggers                    0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.tables                      0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> system_schema.views                       0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> system_schema.functions                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.event_postmortem          2855,140051
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.group_meta                        0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.group_postmortem          5715,381267
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_source           49336,24272521
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_alert                1592,78328
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_group            59896,29310038
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.group_alert               3185,212161
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_child             17319,9226867
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_output         123016,242316519
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event                 137939,66815911
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_meta                       1,57
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.postmortem               4929,4353986
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.audit_type                   8003,2624883
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.audit                        8005,2934624
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.roles                         0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.role_members                  0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.resource_role_permissons_index                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.role_permissions                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_traces.sessions                    0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_traces.events                      0,0
>> 
>> Also, there are many warnings about UDFs executing too slow:
>> 
>> WARN  [Native-Transport-Requests-80] 2018-03-23 05:08:41,327
>> UDFunction.java:436 - User defined function asm_log.ttime_state :
>> (frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint,
>> bigint>> ran longer than 500ms
>> 
>> And sometimes the node crashes with the following:
>> ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027
>> JVMStabilityInspector.java:162 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>> java.util.concurrent.TimeoutException: User defined function
>> <redacted>.checks_state : (frozen<tuple<bigint>>, int, int) ->
>> frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM
>> 
>> There is no substantial increase in load average.
>> 
>> We increased heap size to 12 GB, that did not help.  Is there anything
>> else we can try, besides downgrading?
>> 
>> 
>> 
>> Thanks,
>> 
>> Martin
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to