Re: Cassandra crashes....

Thakrar, Jayesh Tue, 22 Aug 2017 08:55:16 -0700

We are using TWCS compaction.

Here's one sample table


CREATE TABLE ae.raw_logs_by_user (
    dtm_id bigint,
    company_id int,
    source text,
    status_id int,
    log_date bigint,
    uuid_least bigint,
    uuid_most bigint,
    profile_system_id int,
    parent_message_id int,
    parent_template_id int,
    record text,
    PRIMARY KEY (dtm_id, company_id, source, status_id, log_date, uuid_least, 
uuid_most, profile_system_id)
) WITH CLUSTERING ORDER BY (company_id ASC, source ASC, status_id ASC, log_date 
DESC, uuid_least ASC, uuid_most ASC, profile_system_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '7', 'compaction_window_unit': 'DAYS', 
'max_threshold': '4', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';



From: "Fay Hou [Storage Service] " <fay...@coupang.com>
Date: Tuesday, August 22, 2017 at 10:52 AM
To: "Thakrar, Jayesh" <jthak...@conversantmedia.com>
Cc: "user@cassandra.apache.org" <user@cassandra.apache.org>, Surbhi Gupta 
<surbhi.gupt...@gmail.com>
Subject: Re: Cassandra crashes....

what kind compaction? LCS ?

On Aug 22, 2017 8:39 AM, "Thakrar, Jayesh" 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>> wrote:

Surbhi and Fay,



I agree we have plenty of RAM to spare.

However, our data load and compaction churn is so high (partially thanks to 
SSDs!), its causing too much GC pressure.

And as you know the Edenspace and survivor space cleanup is a STW - hence 
larger heap will increase the gc pauses.



As for "what happens" during the crash - nothing.

It seems that the daemon just dies silently.



If you are interested, attached are the Cassandra system.log and the detailed 
gc log files.



system.log = Cassandra log (see line 424 - it’s the last line before the crash)



cassandra-gc.log.8.currrent = last gc log at the time of crash

Cassandra-gc.log.0 = gc log after startup



If you want compare the "gc pauses" grep the gc files for the word "stopped"

(e.g. grep stopped cassandra-gc.log.*)



Thanks for the quick replies!



Jayesh





From: Surbhi Gupta <surbhi.gupt...@gmail.com<mailto:surbhi.gupt...@gmail.com>>
Date: Tuesday, August 22, 2017 at 10:19 AM
To: "Thakrar, Jayesh" 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>>, 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra crashes....



16GB heap is too small for G1GC . Try at least 32GB of heap size

On Tue, Aug 22, 2017 at 7:58 AM Fay Hou [Storage Service]  
<fay...@coupang.com<mailto:fay...@coupang.com>> wrote:

What errors do you see?

16gb of 256 GB . Heap is too small. I would give heap at least 160gb.





On Aug 22, 2017 7:42 AM, "Thakrar, Jayesh" 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>> wrote:



















Hi All,







We are somewhat new users to Cassandra 3.10 on Linux and wanted to ping the 
user group for their experiences.







Our usage profile is  batch jobs that load millions of rows to Cassandra every 
hour.



And there are similar period batch jobs that read millions of rows and do some 
processing, outputting the result to HDFS (no issues with HDFS).







We often seen Cassandra daemons crash.



Key points of our environment are:



Pretty good servers: 54 cores (with hyperthreading), 256 GB RAM, 3.2 TB SSD 
drive



Compaction: TWCS compaction with 7 day windows as the data retention period is 
limited - about 120 days.



JDK: Java 1.8.0.71 and G1 GC



Heap Size: 16 GB



Large SSTables: 50 GB to 300+ GB






We see the daemons crash after some back-to-back long GCs (1.5 to 3.5 seconds).



Note that we had set the target for GC pauses to be 200 ms







We have been somewhat able to tame the crashes by updating the TWCS compaction 
properties



to have min/max compaction sstables = 4 and by drastically reducing the size of 
the New/Eden space (to 5% of heap space = 800 MB).



Its been about 12 hours and our stop-the-world gc pauses are under 90 ms.



Since the servers have more than sufficient resources, we are not seeing any 
noticeable performance impact.







Is this kind of tuning normal/expected?







Thanks,



Jayesh

Re: Cassandra crashes....

Reply via email to