Hi listmembers

Basic info
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}  AND durable_writes = true;
8 linux nodes, SSD. 64GB memory on each server.
Additional information after the signature.

We are about to enter production with this new cluster and are using our own 
(homemade) application to test with.

Problem
We see this frequently in system.log on all servers:

Timestamp WARN  [PERIODIC-COMMIT-LOG-SYNCER] NoSpamLogger.java:94 - Out of 27 
commit log syncs over the past 266.24s with average duration of 53.21ms, 1 have 
exceeded the configured commit interval by an average of 3.89ms
(The last ms number vary from log messages to log message but is never over 
1000ms, more in the 100 ms range)

We have had one ERROR log message on one node:

Timestamp ERROR [MutationStage-2] StorageProxy.java:1414 - Failed to apply 
mutation locally : {}
java.lang.IllegalArgumentException: Mutation of 24.142MiB is too large for the 
maximum size of 16.000MiB

On two other nodes we got this
Timestamp WARN  [MutationStage-3] AbstractLocalAwareExecutorService.java:167 - 
Uncaught exception on thread Thread[MutationStage-3,5,main]: {}
java.lang.IllegalArgumentException: Mutation of 24.142MiB is too large for the 
maximum size of 16.000MiB

Our application got this in the log
Cassandra failure during write query at consistency QUORUM (2 responses were 
required but only 0 replica responded, 2 failed)
com.datastax.driver.core.exceptions.WriteFailureException: Cassandra failure 
during write query at consistency QUORUM (2 responses were required but only 0 
replica responded, 2 failed)

Are the WARNings a sign that there can be ERRORs like this? Are they related 
somehow?

We decided to relax some performance parameters in our application and the WARN 
log messages now come very seldomly but they are there. We have seen the same 
WARN log message at nightime when we don't run our application at all so WARN 
messages were unexpected.

There are no GC warnings about long pauses.

Any thoughts about how to proceed with this issue?

Kind regards
Frank Limstrand
National Library of Norway


All tables created like this:
CREATE TABLE mykeyspace.mytable (
    key blob,
    column1 timeuuid,
    column2 text,
    value blob,
    PRIMARY KEY (key, column1, column2)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (column1 ASC, column2 ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = 'Column Family for storing job execution record information'
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

cassandra.yaml:
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_directory: /d1/cassandra/data/hints
hints_flush_period_in_ms: 10000
max_hints_file_size_in_mb: 128
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
role_manager: CassandraRoleManager
roles_validity_in_ms: 2000
permissions_validity_in_ms: 2000
credentials_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
    - /d1/cassandra/data
    - /d2/cassandra/data
commitlog_directory: /d1/cassandra/commitlog
cdc_enabled: false
disk_failure_policy: stop
commit_failure_policy: stop
prepared_statements_cache_size_mb:
thrift_prepared_statements_cache_size_mb:
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /d2/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
concurrent_reads: 32
concurrent_writes: 96
concurrent_counter_writes: 32
concurrent_materialized_view_writes: 32
memtable_allocation_type: heap_buffers
index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: true
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: ip
start_native_transport: true
native_transport_port: 9042
start_rpc: false
rpc_address: ip
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
column_index_cache_size_in_kb: 2
concurrent_compactors: 12
compaction_throughput_mb_per_sec: 16
sstable_preemptive_open_interval_in_mb: 50
stream_throughput_outbound_megabits_per_sec: 400
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
slow_query_log_timeout_in_ms: 500
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_warning_threshold_mb: 100
gc_warn_threshold_in_ms: 1000
back_pressure_enabled: false

jvm.options
-Xms24G
-Xmx24G
-XX:+UseG1GC

Reply via email to