Hi listmembers Basic info [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4] CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} AND durable_writes = true; 8 linux nodes, SSD. 64GB memory on each server. Additional information after the signature.
We are about to enter production with this new cluster and are using our own (homemade) application to test with. Problem We see this frequently in system.log on all servers: Timestamp WARN [PERIODIC-COMMIT-LOG-SYNCER] NoSpamLogger.java:94 - Out of 27 commit log syncs over the past 266.24s with average duration of 53.21ms, 1 have exceeded the configured commit interval by an average of 3.89ms (The last ms number vary from log messages to log message but is never over 1000ms, more in the 100 ms range) We have had one ERROR log message on one node: Timestamp ERROR [MutationStage-2] StorageProxy.java:1414 - Failed to apply mutation locally : {} java.lang.IllegalArgumentException: Mutation of 24.142MiB is too large for the maximum size of 16.000MiB On two other nodes we got this Timestamp WARN [MutationStage-3] AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[MutationStage-3,5,main]: {} java.lang.IllegalArgumentException: Mutation of 24.142MiB is too large for the maximum size of 16.000MiB Our application got this in the log Cassandra failure during write query at consistency QUORUM (2 responses were required but only 0 replica responded, 2 failed) com.datastax.driver.core.exceptions.WriteFailureException: Cassandra failure during write query at consistency QUORUM (2 responses were required but only 0 replica responded, 2 failed) Are the WARNings a sign that there can be ERRORs like this? Are they related somehow? We decided to relax some performance parameters in our application and the WARN log messages now come very seldomly but they are there. We have seen the same WARN log message at nightime when we don't run our application at all so WARN messages were unexpected. There are no GC warnings about long pauses. Any thoughts about how to proceed with this issue? Kind regards Frank Limstrand National Library of Norway All tables created like this: CREATE TABLE mykeyspace.mytable ( key blob, column1 timeuuid, column2 text, value blob, PRIMARY KEY (key, column1, column2) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC, column2 ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = 'Column Family for storing job execution record information' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; cassandra.yaml: hinted_handoff_enabled: true max_hint_window_in_ms: 10800000 # 3 hours hinted_handoff_throttle_in_kb: 1024 max_hints_delivery_threads: 2 hints_directory: /d1/cassandra/data/hints hints_flush_period_in_ms: 10000 max_hints_file_size_in_mb: 128 batchlog_replay_throttle_in_kb: 1024 authenticator: AllowAllAuthenticator authorizer: AllowAllAuthorizer role_manager: CassandraRoleManager roles_validity_in_ms: 2000 permissions_validity_in_ms: 2000 credentials_validity_in_ms: 2000 partitioner: org.apache.cassandra.dht.RandomPartitioner data_file_directories: - /d1/cassandra/data - /d2/cassandra/data commitlog_directory: /d1/cassandra/commitlog cdc_enabled: false disk_failure_policy: stop commit_failure_policy: stop prepared_statements_cache_size_mb: thrift_prepared_statements_cache_size_mb: key_cache_size_in_mb: key_cache_save_period: 14400 row_cache_size_in_mb: 0 row_cache_save_period: 0 counter_cache_size_in_mb: counter_cache_save_period: 7200 saved_caches_directory: /d2/cassandra/saved_caches commitlog_sync: periodic commitlog_sync_period_in_ms: 10000 commitlog_segment_size_in_mb: 32 concurrent_reads: 32 concurrent_writes: 96 concurrent_counter_writes: 32 concurrent_materialized_view_writes: 32 memtable_allocation_type: heap_buffers index_summary_capacity_in_mb: index_summary_resize_interval_in_minutes: 60 trickle_fsync: true trickle_fsync_interval_in_kb: 10240 storage_port: 7000 ssl_storage_port: 7001 listen_address: ip start_native_transport: true native_transport_port: 9042 start_rpc: false rpc_address: ip rpc_port: 9160 rpc_keepalive: true rpc_server_type: sync thrift_framed_transport_size_in_mb: 15 incremental_backups: false snapshot_before_compaction: false auto_snapshot: true column_index_size_in_kb: 64 column_index_cache_size_in_kb: 2 concurrent_compactors: 12 compaction_throughput_mb_per_sec: 16 sstable_preemptive_open_interval_in_mb: 50 stream_throughput_outbound_megabits_per_sec: 400 read_request_timeout_in_ms: 5000 range_request_timeout_in_ms: 10000 write_request_timeout_in_ms: 2000 counter_write_request_timeout_in_ms: 5000 cas_contention_timeout_in_ms: 1000 truncate_request_timeout_in_ms: 60000 request_timeout_in_ms: 10000 slow_query_log_timeout_in_ms: 500 cross_node_timeout: false endpoint_snitch: SimpleSnitch dynamic_snitch_update_interval_in_ms: 100 dynamic_snitch_reset_interval_in_ms: 600000 dynamic_snitch_badness_threshold: 0.1 request_scheduler: org.apache.cassandra.scheduler.NoScheduler tombstone_warn_threshold: 1000 tombstone_failure_threshold: 100000 batch_size_warn_threshold_in_kb: 5 batch_size_fail_threshold_in_kb: 50 unlogged_batch_across_partitions_warn_threshold: 10 compaction_large_partition_warning_threshold_mb: 100 gc_warn_threshold_in_ms: 1000 back_pressure_enabled: false jvm.options -Xms24G -Xmx24G -XX:+UseG1GC