Hi, I thought that LCS strategy compaction was not advise with high write workload ?
Do you know if driver version have been updated client side ? Have a nice day Kind regards Stephane Le ven. 21 mars 2025, 12:13, William Crowell via user < user@cassandra.apache.org> a écrit : > Thomas and Scott, > > > > Thank you both for your replies. > > > > I am not noticing any exceptions in the logs, but I will check again. > That description of table1 is from the current server with the table and > column names obscured, but that table definition is what we are running > with. You are correct here…scalability I guess was maybe intended > initially, but they never added nodes to their ring. > > > > I would say that the increase in disk usage is probably around 10-15%. > 90GB in 3.x to a little over 100GB with 4.x. I am wondering how much extra > meta-data is being stored with 4.x? > > > > I am wondering if it would be better by creating a new keyspace and table > using LCS instead of STCS and using LZ4 instead of snappy as Thomas > suggested? > > > > Regards, > > > > William Crowell > > > > *From: *C. Scott Andreas <sc...@paradoxica.net> > *Date: *Friday, March 21, 2025 at 12:21 AM > *To: *user@cassandra.apache.org <user@cassandra.apache.org> > *Cc: *user@cassandra.apache.org <user@cassandra.apache.org> > *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x > to 4.1.3 > > You don't often get email from sc...@paradoxica.net. Learn why this is > important <https://aka.ms/LearnAboutSenderIdentification> > > While it's true that COMPACT STORAGE was deprecated in Cassandra 4.0+, it > was not removed. It's largely supported and works just fine for many use > cases. > > > > Feedback from users indicated that removing it entirely would be > disruptive because it required special effort from Cassandra users on > upgrade to 4.x. Much of COMPACT STORAGE was reintroduced in Cassandra 4.x, > and works just fine – including in 5.0 and trunk. > > > > You can find details on that work here: > https://issues.apache.org/jira/browse/CASSANDRA-16217 > > > > Excerpting the docs added at the time: > > > https://github.com/apache/cassandra/commit/0a49d25078665da0ec30d9e69a036de163deb9c3#diff-6a259516d12c778b2b08ec408b884a88bf94d14a9f6dd8d943c6e4376288da6dR479-R480 > > > > –––––– > > A compact storage table is one defined with the COMPACT STORAGE option. > This option is only maintained for backward compatibility for definitions > created before CQL version 3 and shouldn't be used for new tables. 4.0 > supports partially COMPACT STORAGE. There is no support for super column > family. Since Cassandra 3.0, compact storage tables have the exact same > layout internally than non compact ones (for the same schema obviously), > and declaring a table with this option creates limitations for the table > which are largely arbitrary (and exists for historical reasons). > > > > Amongst those limitations: > > > > - A compact storage table cannot use collections nor static columns. > > - If a compact storage table has at least one clustering column, then it > must have *exactly* one column outside of the primary key ones. This > implies you cannot add or remove columns after creation in particular. > > - A compact storage table is limited in the indexes it can create, and > no materialized view can be created on it. > > –––––– > > > > > > > > On Mar 20, 2025, at 8:43 PM, Thomas Elliott <asf.telli...@gmail.com> > wrote: > > > > > > I'm now wondering how your Cassandra is starting with this table > configuration, Part of migrating from 3.x should have dropped COMPACT > STORAGE from your tables. > > > > Do you find any notice of exceptions with Cassandra starting up? was > the Description of table1 that you shared from the current server or from > before the upgrade? > > > > .thomas > > > > On Thu, Mar 20, 2025 at 6:59 PM Thomas Elliott <asf.telli...@gmail.com> > wrote: > > William, > > > > Longtime listener, first time poster. > > > > Something stands out here for me. This table uses COMPACT STORAGE which > is deprecated in Cassandra 4.x and it was a hold-over from the old thrift > API. I'm sure that this is eliciting some gasps from the mailing list. > > > > You might actually be better served by creating a new keyspace and table, > using Leveled Compaction instead of STC and using LZ4 instead of snappy. > > > > After copying the data over to the new table, run a compaction and repair > on the table, and I expect that you should see the footprint decrease. > > > > What I'm concerned about is that you have Compact Storage using this > legacy thrift API which isn't compatible with 4.0, you also have wide > columns and a large partition, which exacerbates the compact storage issue. > > > > I have to keep coming back to that this is a single instance, single > replica and a small volume of data so I shouldn't be thinking of the > scalability of your data model. That said, your data-footprint between > Cassandra .3.x and 4.x shouldn't be 10x difference. It should be bigger > because 4.0 introduces additional meta-data but not THAT much bigger. > > > > I took a stab at a potential new schema for you and this might work > better... > > > > CREATE TABLE keyspace1.table1 ( > > blob1 blob, -- Partition key > > blob2 blob, -- Clustering key > > blue3 blob, -- Data column > > PRIMARY KEY (blob1, blob2) -- Composite primary key > > ) > > WITH CLUSTERING ORDER BY (blob2 ASC) -- Maintain clustering order > > AND compaction = { > > 'class': 'LeveledCompactionStrategy', -- Optimized for write-heavy > workloads > > 'sstable_size_in_mb': '160' -- Default size for LCS > > } > > AND compression = { > > 'class': 'LZ4Compressor', > > 'chunk_length_in_kb': '64' > > } > > AND caching = { > > 'keys': 'ALL', -- Cache all partition keys > > 'rows_per_partition': 'NONE' -- Avoid caching rows to save memory > > } > > AND gc_grace_seconds = 864000 > > AND default_time_to_live = 0 > > AND speculative_retry = '99p' > > AND bloom_filter_fp_chance = 0.01 > > AND crc_check_chance = 1.0 > > AND memtable_flush_period_in_ms = 0; > > > > > > All the best, > > .thomas > > > > On Thu, Mar 20, 2025 at 8:36 AM Michalis Kotsiouros (EXT) via user < > user@cassandra.apache.org> wrote: > > Hello William, > > Does your data get updated? You mentioned in one email that you switched > from TTL 0 to 10 days. This might mean that some data are not deleted > because at some point in time were written with no TTL ( TTL set to 0). > > This has occurred in some systems in past. > > You may use the sstableexpiredblocker tool to check about this. > > > > BR > > MK > > > > *From:* William Crowell via user <user@cassandra.apache.org> > *Sent:* March 20, 2025 13:58 > *To:* user@cassandra.apache.org > *Cc:* William Crowell <wcrow...@perforce.com>; Sartor Fabien (DIN) < > fabien.sar...@etat.ge.ch>; tasmaniede...@free.fr > *Subject:* Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x > to 4.1.3 > > > > Stephane and Fabian, > > > > Good morning and thank you for your replies. Your English is fine. I > just appreciate the help. > > > > Apache Cassandra was overkill for this application, and this was just > something handed to me. The reason why this is an issue is because of the > limited amount of disk space they specified when creating the volume that > Cassandra resides on, and it cannot be expanded. We still have plenty of > disk space but with this growth we will eventually run out. Trying to be > proactive here. > > > > I do not have any snapshots when doing nodetable listsnapshots or backup > data in the backup folder. It could be an actual increase in data size, > and I am not aware of any data being injected into the table by anyone > else. That’s something to check into though. > > > > We initially created this table with Cassandra 3.x and then migrated to > 4.1.3. > > > > I will paste the table description below, and I have changed the table and > column names only. I would never define a table like this with the primary > key being 2 blobs. I would have at least created a uuid field as a primary > key. > > > > The compression ratio is not very good because there are 3 blob columns in > the table. When I first looked at this, I felt LeveledCompactionStrategy > would have been a better fit instead of STCS. > > > > I do agree and see that droppable tombstones are below 5% which is > expected and should not significantly impact the data size. > > > > Have we recently deleted a large amount of data? Not to my knowledge. We > define TTLs of 10 days which Cassandra deletes the records for us. > > > > How do you determine the data size before compaction ? We look at the > disk space being used and notcied the increase : df -H > > > > Do these archives contain the full dataset before the update? Yes they do. > > > > I will try the suggestions you both mentioned. Here is the table > definition : > > > > cqlsh:keyspace1> desc table table1 > > > > /* > > Warning: Table keyspace1.table1 omitted because it has constructs not > compatible with CQL (was created via legacy API). > > Approximate structure, for reference: > > (this should not be used to reproduce this schema) > > > > CREATE TABLE keyspace1.table1 ( > > blob1 blob, > > blob2 blob, > > blue3 blob, > > PRIMARY KEY (blob1, blob2) > > ) WITH COMPACT STORAGE > > AND CLUSTERING ORDER BY (blob2 ASC) > > AND additional_write_policy = '99p' > > AND bloom_filter_fp_chance = 0.01 > > AND caching = {'keys': 'NONE', 'rows_per_partition': '120'} > > AND cdc = false > > AND comment = '' > > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > > AND compression = {'chunk_length_in_kb': '16', 'class': > 'org.apache.cassandra.io.compress.SnappyCompressor'} > > AND memtable = 'default' > > AND crc_check_chance = 1.0 > > AND default_time_to_live = 0 > > AND extensions = {} > > AND gc_grace_seconds = 864000 > > AND max_index_interval = 2048 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > AND read_repair = 'BLOCKING' > > AND speculative_retry = '99p'; > > */ > > > > Regards, > > > > William Crowell > > > > *From: *Sartor Fabien (DIN) via user <user@cassandra.apache.org> > *Date: *Thursday, March 20, 2025 at 6:56 AM > *To: *user@cassandra.apache.org <user@cassandra.apache.org> > *Cc: *Sartor Fabien (DIN) <fabien.sar...@etat.ge.ch> > *Subject: *RE: Increased Disk Usage After Upgrading From Cassandra 3.x.x > to 4.1.3 > > Dear Wiliam, > > > > I'm also sorry for my poor english ! > > > > Compression ratio 0.566… I guess it means If you have 1MB data on your > laptop, you get 0.566MB in Cassandra. > > If you put blobs in the table, it is ok! > > If it is plain english text, It seams low. > > > > I understand answer of below questions from Bowen: > > > > • snapshots could've been created automatically, > > o such as by dropping or truncating tables when auto_snapshots > is set to true: SET TO TRUE > > o or compaction when snapshot_before_compaction is set to true > : SET TO FALSE > > • backups, which could've been created automatically, e.g. > when incremental_backups is set to true: SET TO FALSE > > • mixing repaired and unrepaired sstables, which is usually > caused by incremental repairs, even if it had only been ran once : no > incremental repairs done > > • partially upgraded cluster, e.g. mixed Cassandra version in > the same cluster: Only 1 node un the cluster. > > • token ring change (e.g. adding or removing nodes) without > "nodetool cleanup" : Only 1 node un the cluster. > > • changes made to the compression table properties: No change > done > > > > I don't have the answers of these questions below from Bowen: > > • do you have any snapshots/backup data in the folder ? > > o You can try to run : nodetool listsnapshots > > • actual increase in data size of business data ? > > o maybe one developper use prod env to inject data and didn't > notice it ? > > > > I recently started working in Cassandra administration a few months ago. > > So I’m maybe wrong ! > > > > My team recently migrated Cassandra from version 3 to 4.1.4. > > They didn't observe this behavior. > > All tables use the Leveled Compaction Strategy, and we do not perform > manual compactions. > > > > What I can see is that droppable tombstones are below 5%, which is > expected and should not significantly impact the data size. > > Do we agree with this statement? > > Have you recently deleted a large amount of data? > > Could you run the following command on the 80GB+ SSTable and some others? > > *$CASSANDRA_HOME/tools/bin/sstablemetadata* > > > > > > An SSTable of 80GB+ seems very large. > > An SSTable ensures that you don't have duplicate data inside, meaning this > 80GB+ consists of useful data. > > > > So my question is: How do you determine the data size before compaction? > > I understand that you perform backups—could you check the backup archives > from before the update? > > Do these archives contain the full dataset before the update? > > > > Additionally, try parsing the business data and verify if all the data is > intact. > > You can use the following command to inspect a few records: > > > > *$CASSANDRA_HOME/tools/bin/sstabledump -d* > > > > > > > > Another possibility is to load all the SSTables into another Cassandra > instance using: > > > > *$CASSANDRA_HOME/bin/sstableloader* > > > > Then, check if you get the same data size. > > > > Thank you, > > Best regards, > > Fabien > > > > > > *De :* Tasmaniedemon <tasmaniede...@free.fr> > *Envoyé :* jeudi, 20 mars 2025 09:06 > *À :* user@cassandra.apache.org > *Objet :* Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x > to 4.1.3 > > > > *PRUDENCE.* Ce message provient d'un expéditeur externe à l'Etat. Ne > cliquez sur les liens ou n'ouvrez les pièces jointes que si vous faites > entière confiance à cet expéditeur. > > > > Hi, > > Could you give more details about the use of tables and modeling about > this single node cassandra ? > > Have you began to use Cassandra with 3 version or have you already migrate > before from previous version ( 2.x) ? > > To be honest, i would suggest to use the last release avalable, and to > rebuild and relaad a fresh new cluster with a very low num_token ( and 3 > nodes :-) > > May i ask you why only single node cassandra ? Scalability is not > intended ? > > Sorry for my poor english :-) > > Kind regards > > Stephane > > > > > > Le 19/03/2025 à 14:15, William Crowell via user a écrit : > > Bowen, Fabien, Stéphane, and Luciano, > > > > A bit more information here... > > > > We have not run incremental repairs, and we have not made any changes to > the compression properties on the tables. > > > > When we first started the database the TTL on the records was set to 0 but > not it is set to 10 days. > > > > We do have one table in a keyspace that is occupying 84.1GB of disk space: > > > > ls -l */var/lib/cassandra/data/keyspace1/*table1 > > … > > -rw-rw-r--. 1 xxxxxxxx xxxxxxxxx *84145170181 *Mar 18 08:28 > nb-163033-big-Data.db > > … > > > > Regards, > > > > William Crowell > > > > *From: *William Crowell via user <user@cassandra.apache.org> > <user@cassandra.apache.org> > *Date: *Friday, March 14, 2025 at 10:53 AM > *To: *user@cassandra.apache.org <user@cassandra.apache.org> > <user@cassandra.apache.org> > *Cc: *William Crowell <wcrow...@perforce.com> <wcrow...@perforce.com>, > Bowen Song <bo...@bso.ng> <bo...@bso.ng> > *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x > to 4.1.3 > > Bowen, > > > > This is just a single Cassandra node. Unfortunately, I cannot get on the > box at the moment, but the following configuration is in cassandra.yaml: > > > > snapshot_before_compaction: false > > auto_snapshot: true > > incremental_backups: false > > > > The only other configuration parameter that had been changed other than > the keystore and truststore was num_tokens (default: 16): > > > > num_tokens: 256 > > > > I also noticed the compression ratio on the largest table is not good: > 0.566085855123187 > > > > Regards, > > > > William Crowell > > > > *From: *Bowen Song via user <user@cassandra.apache.org> > <user@cassandra.apache.org> > *Date: *Friday, March 14, 2025 at 10:13 AM > *To: *William Crowell via user <user@cassandra.apache.org> > <user@cassandra.apache.org> > *Cc: *Bowen Song <bo...@bso.ng> <bo...@bso.ng> > *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x > to 4.1.3 > > A few suspects: > > * snapshots, which could've been created automatically, such as by > dropping or truncating tables when auto_snapshots is set to true, or > compaction when snapshot_before_compaction is set to true > > * backups, which could've been created automatically, e.g. when > incremental_backups is set to true > > * mixing repaired and unrepaired sstables, which is usually caused by > incremental repairs, even if it had only been ran once > > * partially upgraded cluster, e.g. mixed Cassandra version in the same > cluster > > * token ring change (e.g. adding or removing nodes) without "nodetool > cleanup" > > * actual increase in data size > > * changes made to the compression table properties > > > > To find the root cause, you will need to check the file/folder sizes to > find out what is using the extra disk space, and may also need to review > the cassandra.yaml file (or post it here with sensitive information > removed) and any actions you've made to the cluster prior to the first > appearance of the issue. > > > > Also, manually running major compactions is no advised. > > On 12/03/2025 20:26, William Crowell via user wrote: > > Hi. A few months ago, I upgraded a single node Cassandra instance from > version 3 to 4.1.3. This instance is not very large with about 15 to 20 > gigabytes of data on version 3, but after the update it has went > substantially up to over 100gb. I do a compaction once a week and take a > snapshot, but with the increase in data it makes the compaction a much > lengthier process. I also did a sstableupate as part of the upgrade. Any > reason for the increased size of the database on the file system? > > > > I am using the default STCS compaction strategy. My “nodetool cfstats” on > a heavily used table looks like this: > > > > Keyspace : xxxxxxxx > > Read Count: 48089 > > Read Latency: 12.52872569610514 ms > > Write Count: 1616682825 > > Write Latency: 0.0067135265490310386 ms > > Pending Flushes: 0 > > Table: sometable > > SSTable count: 13 > > Old SSTable count: 0 > > Space used (live): 104005524836 > > Space used (total): 104005524836 > > Space used by snapshots (total): 0 > > Off heap memory used (total): 116836824 > > SSTable Compression Ratio: 0.566085855123187 > > Number of partitions (estimate): 14277177 > > Memtable cell count: 81033 > > Memtable data size: 13899174 > > Memtable off heap memory used: 0 > > Memtable switch count: 13171 > > Local read count: 48089 > > Local read latency: NaN ms > > Local write count: 1615681213 > > Local write latency: 0.005 ms > > Pending flushes: 0 > > Percent repaired: 0.0 > > Bytes repaired: 0.000KiB > > Bytes unrepaired: 170.426GiB > > Bytes pending repair: 0.000KiB > > Bloom filter false positives: 125 > > Bloom filter false ratio: 0.00494 > > Bloom filter space used: 24656936 > > Bloom filter off heap memory used: 24656832 > > Index summary off heap memory used: 2827608 > > Compression metadata off heap memory used: 89352384 > > Compacted partition minimum bytes: 73 > > Compacted partition maximum bytes: 61214 > > Compacted partition mean bytes: 11888 > > Average live cells per slice (last five minutes): NaN > > Maximum live cells per slice (last five minutes): 0 > > Average tombstones per slice (last five minutes): NaN > > Maximum tombstones per slice (last five minutes): 0 > > Dropped Mutations: 0 > > Droppable tombstone ratio: 0.04983 > > > > > > *This e-mail may contain information that is privileged or confidential. > If you are not the intended recipient, please delete the e-mail and any > attachments and notify us immediately.* > > > > > > *CAUTION:* This email originated from outside of the organization. Do not > click on links or open attachments unless you recognize the sender and know > the content is safe. > > > > > > *This e-mail may contain information that is privileged or confidential. > If you are not the intended recipient, please delete the e-mail and any > attachments and notify us immediately.* > > > > *CAUTION:* This email originated from outside of the organization. Do not > click on links or open attachments unless you recognize the sender and know > the content is safe. > > > > > > *This e-mail may contain information that is privileged or confidential. > If you are not the intended recipient, please delete the e-mail and any > attachments and notify us immediately.* > > > > -- > <image001.gif> > > > > *CAUTION:* This email originated from outside of the organization. Do not > click on links or open attachments unless you recognize the sender and know > the content is safe. > > > > > > *This e-mail may contain information that is privileged or confidential. > If you are not the intended recipient, please delete the e-mail and any > attachments and notify us immediately.* > > > > > > > > *CAUTION:* This email originated from outside of the organization. Do not > click on links or open attachments unless you recognize the sender and know > the content is safe. > > > > This e-mail may contain information that is privileged or confidential. If > you are not the intended recipient, please delete the e-mail and any > attachments and notify us immediately. > >