Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

Stéphane Alleaume Sat, 05 Apr 2025 10:43:36 -0700

Hi,

I thought that LCS strategy compaction was not advise with high write
workload ?


Do you know if driver version have been updated client side ?

Have a nice day
Kind regards
Stephane


Le ven. 21 mars 2025, 12:13, William Crowell via user <
user@cassandra.apache.org> a écrit :

> Thomas and Scott,
>
>
>
> Thank you both for your replies.
>
>
>
> I am not noticing any exceptions in the logs, but I will check again.
> That description of table1 is from the current server with the table and
> column names obscured, but that table definition is what we are running
> with.  You are correct here…scalability I guess was maybe intended
> initially, but they never added nodes to their ring.
>
>
>
> I would say that the increase in disk usage is probably around 10-15%.
> 90GB in 3.x to a little over 100GB with 4.x.  I am wondering how much extra
> meta-data is being stored with 4.x?
>
>
>
> I am wondering if it would be better by creating a new keyspace and table
> using LCS instead of STCS and using LZ4 instead of snappy as Thomas
> suggested?
>
>
>
> Regards,
>
>
>
> William Crowell
>
>
>
> *From: *C. Scott Andreas <sc...@paradoxica.net>
> *Date: *Friday, March 21, 2025 at 12:21 AM
> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
> *Cc: *user@cassandra.apache.org <user@cassandra.apache.org>
> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
> to 4.1.3
>
> You don't often get email from sc...@paradoxica.net. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
>
> While it's true that COMPACT STORAGE was deprecated in Cassandra 4.0+, it
> was not removed. It's largely supported and works just fine for many use
> cases.
>
>
>
> Feedback from users indicated that removing it entirely would be
> disruptive because it required special effort from Cassandra users on
> upgrade to 4.x. Much of COMPACT STORAGE was reintroduced in Cassandra 4.x,
> and works just fine – including in 5.0 and trunk.
>
>
>
> You can find details on that work here:
> https://issues.apache.org/jira/browse/CASSANDRA-16217
>
>
>
> Excerpting the docs added at the time:
>
>
> https://github.com/apache/cassandra/commit/0a49d25078665da0ec30d9e69a036de163deb9c3#diff-6a259516d12c778b2b08ec408b884a88bf94d14a9f6dd8d943c6e4376288da6dR479-R480
>
>
>
> ––––––
>
> A compact storage table is one defined with the COMPACT STORAGE option.
> This option is only maintained for backward compatibility for definitions
> created before CQL version 3 and shouldn't be used for new tables. 4.0
> supports partially COMPACT STORAGE. There is no support for super column
> family. Since Cassandra 3.0, compact storage tables have the exact same
> layout internally than non compact ones (for the same schema obviously),
> and declaring a table with this option creates limitations for the table
> which are largely arbitrary (and exists for historical reasons).
>
>
>
> Amongst those limitations:
>
>
>
>   - A compact storage table cannot use collections nor static columns.
>
>   - If a compact storage table has at least one clustering column, then it
> must have *exactly* one column outside of the primary key ones. This
> implies you cannot add or remove columns after creation in particular.
>
>   - A compact storage table is limited in the indexes it can create, and
> no materialized view can be created on it.
>
> ––––––
>
>
>
>
>
>
>
> On Mar 20, 2025, at 8:43 PM, Thomas Elliott <asf.telli...@gmail.com>
> wrote:
>
>
>
>
>
> I'm now wondering how your Cassandra is starting with this table
> configuration, Part of migrating from 3.x should have dropped COMPACT
> STORAGE from your tables.
>
>
>
> Do you find any notice of exceptions with Cassandra starting up?    was
> the Description of table1 that you shared from the current server or from
> before the upgrade?
>
>
>
> .thomas
>
>
>
> On Thu, Mar 20, 2025 at 6:59 PM Thomas Elliott <asf.telli...@gmail.com>
> wrote:
>
> William,
>
>
>
> Longtime listener, first time poster.
>
>
>
> Something stands out here for me.  This table uses COMPACT STORAGE which
> is deprecated in Cassandra 4.x and it was a hold-over from the old thrift
> API.  I'm sure that this is eliciting some gasps from the mailing list.
>
>
>
> You might actually be better served by creating a new keyspace and table,
> using Leveled Compaction instead of STC and using LZ4 instead of snappy.
>
>
>
> After copying the data over to the new table, run a compaction and repair
> on the table, and I expect that you should see the footprint decrease.
>
>
>
> What I'm concerned about is that you have Compact Storage using this
> legacy thrift API which isn't compatible with 4.0, you also have wide
> columns and a large partition, which exacerbates the compact storage issue.
>
>
>
> I have to keep coming back to that this is a single instance, single
> replica and a small volume of data so I shouldn't be thinking of the
> scalability of your data model.  That said, your data-footprint between
> Cassandra .3.x and 4.x shouldn't be 10x difference.  It should be bigger
> because 4.0 introduces additional meta-data but not THAT much bigger.
>
>
>
> I took a stab at a potential new schema for you and this might work
> better...
>
>
>
> CREATE TABLE keyspace1.table1 (
>
>     blob1 blob,                          -- Partition key
>
>     blob2 blob,                          -- Clustering key
>
>     blue3 blob,                          -- Data column
>
>     PRIMARY KEY (blob1, blob2)           -- Composite primary key
>
> )
>
> WITH CLUSTERING ORDER BY (blob2 ASC)      -- Maintain clustering order
>
> AND compaction = {
>
>     'class': 'LeveledCompactionStrategy', -- Optimized for write-heavy 
> workloads
>
>     'sstable_size_in_mb': '160'           -- Default size for LCS
>
> }
>
> AND compression = {
>
>     'class': 'LZ4Compressor',
>
>     'chunk_length_in_kb': '64'
>
> }
>
> AND caching = {
>
>     'keys': 'ALL',                        -- Cache all partition keys
>
>     'rows_per_partition': 'NONE'          -- Avoid caching rows to save memory
>
> }
>
> AND gc_grace_seconds = 864000
>
> AND default_time_to_live = 0
>
> AND speculative_retry = '99p'
>
> AND bloom_filter_fp_chance = 0.01
>
> AND crc_check_chance = 1.0
>
> AND memtable_flush_period_in_ms = 0;
>
>
>
>
>
> All the best,
>
> .thomas
>
>
>
> On Thu, Mar 20, 2025 at 8:36 AM Michalis Kotsiouros (EXT) via user <
> user@cassandra.apache.org> wrote:
>
> Hello William,
>
> Does your data get updated? You mentioned in one email that you switched
> from TTL 0 to 10 days. This might mean that some data are not deleted
> because at some point in time were written with no TTL ( TTL set to 0).
>
> This has occurred in some systems in past.
>
> You may use the sstableexpiredblocker tool to check about this.
>
>
>
> BR
>
> MK
>
>
>
> *From:* William Crowell via user <user@cassandra.apache.org>
> *Sent:* March 20, 2025 13:58
> *To:* user@cassandra.apache.org
> *Cc:* William Crowell <wcrow...@perforce.com>; Sartor Fabien (DIN) <
> fabien.sar...@etat.ge.ch>; tasmaniede...@free.fr
> *Subject:* Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
> to 4.1.3
>
>
>
> Stephane and Fabian,
>
>
>
> Good morning and thank you for your replies.  Your English is fine.  I
> just appreciate the help.
>
>
>
> Apache Cassandra was overkill for this application, and this was just
> something handed to me.  The reason why this is an issue is because of the
> limited amount of disk space they specified when creating the volume that
> Cassandra resides on, and it cannot be expanded.  We still have plenty of
> disk space but with this growth we will eventually run out.  Trying to be
> proactive here.
>
>
>
> I do not have any snapshots when doing nodetable listsnapshots or backup
> data in the backup folder.  It could be an actual increase in data size,
> and I am not aware of any data being injected into the table by anyone
> else.  That’s something to check into though.
>
>
>
> We initially created this table with Cassandra 3.x and then migrated to
> 4.1.3.
>
>
>
> I will paste the table description below, and I have changed the table and
> column names only.  I would never define a table like this with the primary
> key being 2 blobs.  I would have at least created a uuid field as a primary
> key.
>
>
>
> The compression ratio is not very good because there are 3 blob columns in
> the table.  When I first looked at this, I felt LeveledCompactionStrategy
> would have been a better fit instead of STCS.
>
>
>
> I do agree and see that droppable tombstones are below 5% which is
> expected and should not significantly impact the data size.
>
>
>
> Have we recently deleted a large amount of data?  Not to my knowledge.  We
> define TTLs of 10 days which Cassandra deletes the records for us.
>
>
>
> How do you determine the data size before compaction ?  We look at the
> disk space being used and notcied the increase : df -H
>
>
>
> Do these archives contain the full dataset before the update?  Yes they do.
>
>
>
> I will try the suggestions you both mentioned.  Here is the table
> definition :
>
>
>
> cqlsh:keyspace1> desc table table1
>
>
>
> /*
>
> Warning: Table keyspace1.table1 omitted because it has constructs not
> compatible with CQL (was created via legacy API).
>
> Approximate structure, for reference:
>
> (this should not be used to reproduce this schema)
>
>
>
> CREATE TABLE keyspace1.table1 (
>
>     blob1 blob,
>
>     blob2 blob,
>
>     blue3 blob,
>
>     PRIMARY KEY (blob1, blob2)
>
> ) WITH COMPACT STORAGE
>
>     AND CLUSTERING ORDER BY (blob2 ASC)
>
>     AND additional_write_policy = '99p'
>
>     AND bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'NONE', 'rows_per_partition': '120'}
>
>     AND cdc = false
>
>     AND comment = ''
>
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '16', 'class':
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
>
>     AND memtable = 'default'
>
>     AND crc_check_chance = 1.0
>
>     AND default_time_to_live = 0
>
>     AND extensions = {}
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair = 'BLOCKING'
>
>     AND speculative_retry = '99p';
>
> */
>
>
>
> Regards,
>
>
>
> William Crowell
>
>
>
> *From: *Sartor Fabien (DIN) via user <user@cassandra.apache.org>
> *Date: *Thursday, March 20, 2025 at 6:56 AM
> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
> *Cc: *Sartor Fabien (DIN) <fabien.sar...@etat.ge.ch>
> *Subject: *RE: Increased Disk Usage After Upgrading From Cassandra 3.x.x
> to 4.1.3
>
> Dear Wiliam,
>
>
>
> I'm also sorry for my poor english !
>
>
>
> Compression ratio 0.566… I guess it means If you have 1MB data on your
> laptop, you get 0.566MB in Cassandra.
>
> If you put blobs in the table, it is ok!
>
> If it is plain english text, It seams low.
>
>
>
> I understand answer of below questions from Bowen:
>
>
>
> •             snapshots could've been created automatically,
>
> o             such as by dropping or truncating tables when auto_snapshots
> is set to true: SET TO TRUE
>
> o             or compaction when snapshot_before_compaction is set to true
> : SET TO FALSE
>
> •             backups, which could've been created automatically, e.g.
> when incremental_backups is set to true: SET TO FALSE
>
> •             mixing repaired and unrepaired sstables, which is usually
> caused by incremental repairs, even if it had only been ran once : no
> incremental repairs done
>
> •             partially upgraded cluster, e.g. mixed Cassandra version in
> the same cluster: Only 1 node un the cluster.
>
> •             token ring change (e.g. adding or removing nodes) without
> "nodetool cleanup" : Only 1 node un the cluster.
>
> •             changes made to the compression table properties: No change
> done
>
>
>
> I don't have the answers of these questions below from Bowen:
>
> •             do you have any snapshots/backup data in the folder ?
>
> o             You can try to run : nodetool listsnapshots
>
> •             actual increase in data size of business data ?
>
> o             maybe one developper use prod env to inject data and didn't
> notice it ?
>
>
>
> I recently started working in Cassandra administration a few months ago.
>
> So I’m maybe wrong !
>
>
>
> My team recently migrated Cassandra from version 3 to 4.1.4.
>
> They didn't observe this behavior.
>
> All tables use the Leveled Compaction Strategy, and we do not perform
> manual compactions.
>
>
>
> What I can see is that droppable tombstones are below 5%, which is
> expected and should not significantly impact the data size.
>
> Do we agree with this statement?
>
> Have you recently deleted a large amount of data?
>
> Could you run the following command on the 80GB+ SSTable and some others?
>
>     *$CASSANDRA_HOME/tools/bin/sstablemetadata*
>
>
>
>
>
> An SSTable of 80GB+ seems very large.
>
> An SSTable ensures that you don't have duplicate data inside, meaning this
> 80GB+ consists of useful data.
>
>
>
> So my question is: How do you determine the data size before compaction?
>
> I understand that you perform backups—could you check the backup archives
> from before the update?
>
> Do these archives contain the full dataset before the update?
>
>
>
> Additionally, try parsing the business data and verify if all the data is
> intact.
>
> You can use the following command to inspect a few records:
>
>
>
> *$CASSANDRA_HOME/tools/bin/sstabledump -d*
>
>
>
>
>
>
>
> Another possibility is to load all the SSTables into another Cassandra
> instance using:
>
>
>
> *$CASSANDRA_HOME/bin/sstableloader*
>
>
>
> Then, check if you get the same data size.
>
>
>
> Thank you,
>
> Best regards,
>
> Fabien
>
>
>
>
>
> *De :* Tasmaniedemon <tasmaniede...@free.fr>
> *Envoyé :* jeudi, 20 mars 2025 09:06
> *À :* user@cassandra.apache.org
> *Objet :* Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
> to 4.1.3
>
>
>
> *PRUDENCE.* Ce message provient d'un expéditeur externe à l'Etat. Ne
> cliquez sur les liens ou n'ouvrez les pièces jointes que si vous faites
> entière confiance à cet expéditeur.
>
>
>
> Hi,
>
> Could you give more details about the use of tables and modeling about
> this single node cassandra ?
>
> Have you began to use Cassandra with 3 version or have you already migrate
> before from previous version ( 2.x) ?
>
> To be honest, i would suggest to use the last release avalable, and to
> rebuild and relaad a fresh new cluster with a very low num_token ( and 3
> nodes :-)
>
> May i ask you why only single node cassandra ? Scalability is not
> intended ?
>
> Sorry for my poor english :-)
>
> Kind regards
>
> Stephane
>
>
>
>
>
> Le 19/03/2025 à 14:15, William Crowell via user a écrit :
>
> Bowen, Fabien, Stéphane, and Luciano,
>
>
>
> A bit more information here...
>
>
>
> We have not run incremental repairs, and we have not made any changes to
> the compression properties on the tables.
>
>
>
> When we first started the database the TTL on the records was set to 0 but
> not it is set to 10 days.
>
>
>
> We do have one table in a keyspace that is occupying 84.1GB of disk space:
>
>
>
> ls -l */var/lib/cassandra/data/keyspace1/*table1
>
> …
>
> -rw-rw-r--. 1 xxxxxxxx xxxxxxxxx *84145170181 *Mar 18 08:28
> nb-163033-big-Data.db
>
> …
>
>
>
> Regards,
>
>
>
> William Crowell
>
>
>
> *From: *William Crowell via user <user@cassandra.apache.org>
> <user@cassandra.apache.org>
> *Date: *Friday, March 14, 2025 at 10:53 AM
> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
> <user@cassandra.apache.org>
> *Cc: *William Crowell <wcrow...@perforce.com> <wcrow...@perforce.com>,
> Bowen Song <bo...@bso.ng> <bo...@bso.ng>
> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
> to 4.1.3
>
> Bowen,
>
>
>
> This is just a single Cassandra node.  Unfortunately, I cannot get on the
> box at the moment, but the following configuration is in cassandra.yaml:
>
>
>
> snapshot_before_compaction: false
>
> auto_snapshot: true
>
> incremental_backups: false
>
>
>
> The only other configuration parameter that had been changed other than
> the keystore and truststore was num_tokens (default: 16):
>
>
>
> num_tokens: 256
>
>
>
> I also noticed the compression ratio on the largest table is not good:
>  0.566085855123187
>
>
>
> Regards,
>
>
>
> William Crowell
>
>
>
> *From: *Bowen Song via user <user@cassandra.apache.org>
> <user@cassandra.apache.org>
> *Date: *Friday, March 14, 2025 at 10:13 AM
> *To: *William Crowell via user <user@cassandra.apache.org>
> <user@cassandra.apache.org>
> *Cc: *Bowen Song <bo...@bso.ng> <bo...@bso.ng>
> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
> to 4.1.3
>
> A few suspects:
>
> * snapshots, which could've been created automatically, such as by
> dropping or truncating tables when auto_snapshots is set to true, or
> compaction when snapshot_before_compaction is set to true
>
> * backups, which could've been created automatically, e.g. when
> incremental_backups is set to true
>
> * mixing repaired and unrepaired sstables, which is usually caused by
> incremental repairs, even if it had only been ran once
>
> * partially upgraded cluster, e.g. mixed Cassandra version in the same
> cluster
>
> * token ring change (e.g. adding or removing nodes) without "nodetool
> cleanup"
>
> * actual increase in data size
>
> * changes made to the compression table properties
>
>
>
> To find the root cause, you will need to check the file/folder sizes to
> find out what is using the extra disk space, and may also need to review
> the cassandra.yaml file (or post it here with sensitive information
> removed) and any actions you've made to the cluster prior to the first
> appearance of the issue.
>
>
>
> Also, manually running major compactions is no advised.
>
> On 12/03/2025 20:26, William Crowell via user wrote:
>
> Hi.  A few months ago, I upgraded a single node Cassandra instance from
> version 3 to 4.1.3.  This instance is not very large with about 15 to 20
> gigabytes of data on version 3, but after the update it has went
> substantially up to over 100gb.  I do a compaction once a week and take a
> snapshot, but with the increase in data it makes the compaction a much
> lengthier process.  I also did a sstableupate as part of the upgrade.  Any
> reason for the increased size of the database on the file system?
>
>
>
> I am using the default STCS compaction strategy.  My “nodetool cfstats” on
> a heavily used table looks like this:
>
>
>
> Keyspace : xxxxxxxx
>
>         Read Count: 48089
>
>         Read Latency: 12.52872569610514 ms
>
>         Write Count: 1616682825
>
>         Write Latency: 0.0067135265490310386 ms
>
>         Pending Flushes: 0
>
>                 Table: sometable
>
>                 SSTable count: 13
>
>                 Old SSTable count: 0
>
>                 Space used (live): 104005524836
>
>                 Space used (total): 104005524836
>
>                 Space used by snapshots (total): 0
>
>                 Off heap memory used (total): 116836824
>
>                 SSTable Compression Ratio: 0.566085855123187
>
>                 Number of partitions (estimate): 14277177
>
>                 Memtable cell count: 81033
>
>                 Memtable data size: 13899174
>
>                 Memtable off heap memory used: 0
>
>                 Memtable switch count: 13171
>
>                 Local read count: 48089
>
>                 Local read latency: NaN ms
>
>                 Local write count: 1615681213
>
>                 Local write latency: 0.005 ms
>
>                 Pending flushes: 0
>
>                 Percent repaired: 0.0
>
>                 Bytes repaired: 0.000KiB
>
>                 Bytes unrepaired: 170.426GiB
>
>                 Bytes pending repair: 0.000KiB
>
>                 Bloom filter false positives: 125
>
>                 Bloom filter false ratio: 0.00494
>
>                 Bloom filter space used: 24656936
>
>                 Bloom filter off heap memory used: 24656832
>
>                 Index summary off heap memory used: 2827608
>
>                 Compression metadata off heap memory used: 89352384
>
>                 Compacted partition minimum bytes: 73
>
>                 Compacted partition maximum bytes: 61214
>
>                 Compacted partition mean bytes: 11888
>
>                 Average live cells per slice (last five minutes): NaN
>
>                 Maximum live cells per slice (last five minutes): 0
>
>                 Average tombstones per slice (last five minutes): NaN
>
>                 Maximum tombstones per slice (last five minutes): 0
>
>                 Dropped Mutations: 0
>
>                 Droppable tombstone ratio: 0.04983
>
>
>
>
>
> *This e-mail may contain information that is privileged or confidential.
> If you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.*
>
>
>
>
>
> *CAUTION:* This email originated from outside of the organization. Do not
> click on links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
>
>
>
> *This e-mail may contain information that is privileged or confidential.
> If you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.*
>
>
>
> *CAUTION:* This email originated from outside of the organization. Do not
> click on links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
>
>
>
> *This e-mail may contain information that is privileged or confidential.
> If you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.*
>
>
>
> --
> <image001.gif>
>
>
>
> *CAUTION:* This email originated from outside of the organization. Do not
> click on links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
>
>
>
> *This e-mail may contain information that is privileged or confidential.
> If you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.*
>
>
>
>
>
>
>
> *CAUTION:* This email originated from outside of the organization. Do not
> click on links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
>
> This e-mail may contain information that is privileged or confidential. If
> you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.
>
>

Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

Reply via email to