@Alain, @Jeff Thank you very much for your time. I really appreciate it!
Yes I found many posts/hints about TWCS, definitely look very promising. I understand correctly that I can swap compaction strategy without any major concern, right? About the read repair, Am I correct in thinking that the read repair in controlled by both options: 'read_repair_chance' and 'dclocal_read_repair_chance'. If that is the case I see that I still have read repair turned on... Best! On Mon, Jul 11, 2016 at 10:05 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > @Jeff > > Rather than being an alternative, isn't your compaction strategy going to > deprecate (and finally replace) DTCS ? That was my understanding from the > ticket CASSANDRA-9666. > > @Riccardo > > If you are interested in TWCS from Jeff, I believe it has been introduced > in 3.0.8 actually, not 3.0.7 > https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28. > Anyway, you can use it in any recent version as compactions strategies are > pluggable. > > What concerns me is that I have an high tombstone read count despite those >> are insert only tables. Compacting the table make the tombstone issue >> disappear. Yes, we are using TTL to expire data after 3 months and I have >> not touch the GC grace period. >> > > I observed the same issue recently and I am confident that TWCS will solve > this tombstone issue, but it is not tested on my side so far. Meanwhile, be > sure you have disabled any "read repair" on tables using DTCS and maybe > hints as well. It is a hard decision to take as you'll loose 2 out of 3 > anti entropy systems, but DTCS behaves badly with those options turned on > (TWCS is fine with it). The last anti-entropy being a full repair that you > might already not be running as you only do inserts... > > Also instead of major compactions (which comes with its set of issues / > tradeoffs too) you can think of a script smartly using sstablemetadata to > find the sstables holding too much tombstones and running single SSTable > compactions on them through JMX and user defined compactions. Meanwhile if > you want to do it manually, you could do it with something like this to > know the tombstone ratio from the biggest sstable: > > du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du > -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs > sstablemetadata | grep tombstones > And something like this to run a user defined compaction on the ones you > chose (big sstable with high tombstone ratio): > > echo "run -b org.apache.cassandra.db:type=CompactionManager > forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar > jmxterm-version.jar -l <ip>:<jmx_port> > > *note:* you have to download jmxterm (or use any other jmx tool). > > > Did you give a try to the unchecked_tombstone_compaction as well > (compaction options at the table level)? Feel free to set this one to true. > I think it could be the default. It is safe as long as your machines have > some more resources available (not that much). That's the first thing I > would do. > > > Also if you use TTL only, feel free to reduce the gc_grace_seconds, this > will probably help having tombstones removed. I would start with other > solutions first. Keep in mind that if someday you perform deletes, this > setting could produce you some Zombies (data coming back), if you don't run > repair in the gc_grace_seconds for the entire ring. > > C*heers, > > ----------------------- > > Alain Rodriguez - al...@thelastpickle.com > > France > > > The Last Pickle - Apache Cassandra Consulting > > http://www.thelastpickle.com > > 2016-07-07 19:25 GMT+02:00 Jeff Jirsa <jeff.ji...@crowdstrike.com>: > >> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow >> over time, but ideally data will expire as it nears your 90 day TTL and >> those tables should start dropping away as they age. >> >> >> >> 3.0.7 introduces an alternative to DTCS you may find easier to use called >> TWCS. It will almost certainly help address the growing sstable count. >> >> >> >> >> >> >> >> *From: *Riccardo Ferrari <ferra...@gmail.com> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Thursday, July 7, 2016 at 6:49 AM >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *DTCS SSTable count issue >> >> >> >> Hi everyone, >> >> >> >> This is my first question, apologize may I do something wrong. >> >> >> >> I have a small Cassandra cluster build upon 3 nodes. Originally born as >> 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4 >> recently 3.0.6. Ubuntu is the OS. >> >> >> >> There are few tables that have DateTieredCompactionStrategy and are >> suffering of constantly growing SSTable count. I have the feeling this has >> something to do with the upgrade however I need some hint on how to debug >> this issue. >> >> >> >> Tables are created like: >> >> CREATE TABLE <table> ( >> >> ... >> >> PRIMARY KEY (...) >> >> ) WITH CLUSTERING ORDER BY (...) >> >> AND bloom_filter_fp_chance = 0.01 >> >> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} >> >> AND comment = '' >> >> AND compaction = {'class': >> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', >> 'max_threshold': '32', 'min_threshold': '4'} >> >> AND compression = {'chunk_length_in_kb': '64', 'class': >> 'org.apache.cassandra.io.compress.LZ4Compressor'} >> >> AND crc_check_chance = 1.0 >> >> AND dclocal_read_repair_chance = 0.1 >> >> AND default_time_to_live = 7776000 >> >> AND gc_grace_seconds = 864000 >> >> AND max_index_interval = 2048 >> >> AND memtable_flush_period_in_ms = 0 >> >> AND min_index_interval = 128 >> >> AND read_repair_chance = 0.0 >> >> AND speculative_retry = '99PERCENTILE'; >> >> >> >> and this is the "nodetool cfstats" output for that table: >> >> Read Count: 39 >> >> Read Latency: 85.03307692307692 ms. >> >> Write Count: 9845275 >> >> Write Latency: 0.09604882382665797 ms. >> >> Pending Flushes: 0 >> >> Table: <table> >> >> SSTable count: 48 >> >> Space used (live): 19566109394 >> >> Space used (total): 19566109394 >> >> Space used by snapshots (total): 109796505570 >> >> Off heap memory used (total): 11317941 >> >> SSTable Compression Ratio: 0.22632301701483284 >> >> Number of keys (estimate): 2557 >> >> Memtable cell count: 0 >> >> Memtable data size: 0 >> >> Memtable off heap memory used: 0 >> >> Memtable switch count: 828 >> >> Local read count: 39 >> >> Local read latency: 93.051 ms >> >> Local write count: 9845275 >> >> Local write latency: 0.106 ms >> >> Pending flushes: 0 >> >> Bloom filter false positives: 2 >> >> Bloom filter false ratio: 0.00000 >> >> Bloom filter space used: 10200 >> >> Bloom filter off heap memory used: 9816 >> >> Index summary off heap memory used: 4677 >> >> Compression metadata off heap memory used: 11303448 >> >> Compacted partition minimum bytes: 150 >> >> Compacted partition maximum bytes: 4139110981 >> >> Compacted partition mean bytes: 13463937 >> >> Average live cells per slice (last five minutes): 59.69230769230769 >> >> Maximum live cells per slice (last five minutes): 149 >> >> Average tombstones per slice (last five minutes): 8.564102564102564 >> >> Maximum tombstones per slice (last five minutes): 42 >> >> >> >> According to the "nodetool compactionhistory <keyspace>.<table>" >> >> the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT" >> >> and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY) >> >> >> >> However the table count is still very high compared to tables that have a >> different compaction strategy. If I run a "nodetool compact <table>" the >> SSTable count decrease dramatically to a reasonable number. >> >> I read many articles including: >> http://www.datastax.com/dev/blog/datetieredcompactionstrategy >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_datetieredcompactionstrategy&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=35ADGtvp3nLmSgTuemeQ5e3RIubiM_mbcWLyBbv6DEo&s=_1xjcAR70HQlYtx4geGugprQxrSNw2EaiSjeSWm2CJ4&e=> >> however I can not really tell if this is an expected behavior. >> >> What concerns me is that I have an high tombstone read count despite >> those are insert only tables. Compacting the table make the tombstone issue >> disappear. Yes, we are using TTL to expire data after 3 months and I have >> not touch the GC grace period. >> >> Looking at the file system I see the very first *-Data.db file that is >> 15GB then there are all the other 43 *-Data.db files that are ranging from >> 50 to 150MB in size. >> >> >> >> How can I debug this mis-compaction issue? Any help is much appreciated >> >> Best, >> > >