Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

edi mari Mon, 31 Mar 2025 09:09:21 -0700

Why not use *nodetool** import* tool ?
Create a new table, copy the files from the old table to the new table
directory, and then use nodetool import .
*nodetool import [keyspace] [target_table] [targate_table_directory]*


edi

On Mon, Mar 31, 2025 at 5:22 PM Stéphane Alleaume <tasmaniede...@free.fr>
wrote:

> Hi
>
> I would do the same
>
> Kind regards
> Stéphane
>
>
> Le 31 mars 2025 16:13:01 GMT+02:00, "Durity, Sean R via user" <
> user@cassandra.apache.org> a écrit :
>
>> I would use dsbulk for the unload and load of data. It is very fast and
>> specific to a table.
>>
>>
>>
>>
>>
>> Sean R. Durity
>>
>>
>>
>> INTERNAL USE
>> ------------------------------
>> *From:* William Crowell via user <user@cassandra.apache.org>
>> *Sent:* Monday, March 31, 2025 10:05 AM
>> *To:* user@cassandra.apache.org <user@cassandra.apache.org>; Stéphane
>> Alleaume <crystallo...@gmail.com>
>> *Cc:* William Crowell <wcrow...@perforce.com>; sc...@paradoxica.net <
>> sc...@paradoxica.net>; asf.telli...@gmail.com <asf.telli...@gmail.com>
>> *Subject:* [EXTERNAL] Re: Increased Disk Usage After Upgrading From
>> Cassandra 3.x.x to 4.1.3
>>
>> Good morning. What would be the best way to dump a large table (about
>> 90GB), drop the table, recreate the table, and then reload the data? I am
>> looking through this documentation: https: //cassandra. apache. org/doc/4.
>> 1/cassandra/operating/backups. html
>>
>> Good morning.  What would be the best way to dump a large table (about
>> 90GB), drop the table, recreate the table, and then reload the data?
>>
>>
>>
>> I am looking through this documentation: 
>> https://cassandra.apache.org/doc/4.1/cassandra/operating/backups.html
>> [cassandra.apache.org]
>> <https://urldefense.com/v3/__https://cassandra.apache.org/doc/4.1/cassandra/operating/backups.html__;!!HR4Zk_QqkEPgHrsGXw!3r8LAYhLcAAxsgwclK9zx0UVoovrOK9Z5xZawc5c3cSssAdfw0tACC3LtrLfNLPlhOs5HyEmNk9HPQUJug0cYCLvpeM$>
>>
>>
>>
>> To do the restore it looks like I would need to take a snapshot and then
>> drop the table.  Recreate the table with the CQL that Thomas provided and
>> then use either sstableloader or nodetool import to reload it.  Does that
>> make sense?
>>
>>
>>
>> Regards,
>>
>>
>>
>> William Crowell
>>
>>
>>
>> *From: *William Crowell via user <user@cassandra.apache.org>
>> *Date: *Monday, March 24, 2025 at 3:07 PM
>> *To: *user@cassandra.apache.org <user@cassandra.apache.org>, Stéphane
>> Alleaume <crystallo...@gmail.com>
>> *Cc: *William Crowell <wcrow...@perforce.com>, sc...@paradoxica.net <
>> sc...@paradoxica.net>, asf.telli...@gmail.com <asf.telli...@gmail.com>
>> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>> Surprisingly sstableexpiredblockers returned nothing.  I am looking into
>> recreating a new keyspace and table using LCS instead of STCS.
>>
>>
>>
>> Regards,
>>
>>
>>
>> William Crowell
>>
>>
>>
>> *From: *William Crowell via user <user@cassandra.apache.org>
>> *Date: *Friday, March 21, 2025 at 7:36 AM
>> *To: *Stéphane Alleaume <crystallo...@gmail.com>,
>> user@cassandra.apache.org <user@cassandra.apache.org>
>> *Cc: *William Crowell <wcrow...@perforce.com>, sc...@paradoxica.net <
>> sc...@paradoxica.net>, asf.telli...@gmail.com <asf.telli...@gmail.com>
>> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>> Stéphane,
>>
>>
>>
>> I will check.  What issues could that cause if the driver on the client
>> side was outdated?
>>
>>
>>
>> Regards,
>>
>>
>>
>> William Crowell
>>
>>
>>
>> *From: *Stéphane Alleaume <crystallo...@gmail.com>
>> *Date: *Friday, March 21, 2025 at 7:32 AM
>> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
>> *Cc: *sc...@paradoxica.net <sc...@paradoxica.net>, asf.telli...@gmail.com
>> <asf.telli...@gmail.com>, William Crowell <wcrow...@perforce.com>
>> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>> You don't often get email from crystallo...@gmail.com. Learn why this is
>> important [aka.ms]
>> <https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!HR4Zk_QqkEPgHrsGXw!3r8LAYhLcAAxsgwclK9zx0UVoovrOK9Z5xZawc5c3cSssAdfw0tACC3LtrLfNLPlhOs5HyEmNk9HPQUJug0cV_5ob4I$>
>>
>> Hi,
>>
>>
>>
>> I thought that LCS strategy compaction was not advise with high write
>> workload ?
>>
>>
>>
>> Do you know if driver version have been updated client side ?
>>
>>
>>
>> Have a nice day
>>
>> Kind regards
>>
>> Stephane
>>
>>
>>
>> Le ven. 21 mars 2025, 12:13, William Crowell via user <
>> user@cassandra.apache.org> a écrit :
>>
>> Thomas and Scott,
>>
>>
>>
>> Thank you both for your replies.
>>
>>
>>
>> I am not noticing any exceptions in the logs, but I will check again.
>> That description of table1 is from the current server with the table and
>> column names obscured, but that table definition is what we are running
>> with.  You are correct here…scalability I guess was maybe intended
>> initially, but they never added nodes to their ring.
>>
>>
>>
>> I would say that the increase in disk usage is probably around 10-15%.
>> 90GB in 3.x to a little over 100GB with 4.x.  I am wondering how much extra
>> meta-data is being stored with 4.x?
>>
>>
>>
>> I am wondering if it would be better by creating a new keyspace and table
>> using LCS instead of STCS and using LZ4 instead of snappy as Thomas
>> suggested?
>>
>>
>>
>> Regards,
>>
>>
>>
>> William Crowell
>>
>>
>>
>> *From: *C. Scott Andreas <sc...@paradoxica.net>
>> *Date: *Friday, March 21, 2025 at 12:21 AM
>> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
>> *Cc: *user@cassandra.apache.org <user@cassandra.apache.org>
>> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>> You don't often get email from sc...@paradoxica.net. Learn why this is
>> important [aka.ms]
>> <https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!HR4Zk_QqkEPgHrsGXw!3r8LAYhLcAAxsgwclK9zx0UVoovrOK9Z5xZawc5c3cSssAdfw0tACC3LtrLfNLPlhOs5HyEmNk9HPQUJug0cV_5ob4I$>
>>
>> While it's true that COMPACT STORAGE was deprecated in Cassandra 4.0+, it
>> was not removed. It's largely supported and works just fine for many use
>> cases.
>>
>>
>>
>> Feedback from users indicated that removing it entirely would be
>> disruptive because it required special effort from Cassandra users on
>> upgrade to 4.x. Much of COMPACT STORAGE was reintroduced in Cassandra 4.x,
>> and works just fine – including in 5.0 and trunk.
>>
>>
>>
>> You can find details on that work here:
>> https://issues.apache.org/jira/browse/CASSANDRA-16217 [issues.apache.org]
>> <https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CASSANDRA-16217__;!!HR4Zk_QqkEPgHrsGXw!3r8LAYhLcAAxsgwclK9zx0UVoovrOK9Z5xZawc5c3cSssAdfw0tACC3LtrLfNLPlhOs5HyEmNk9HPQUJug0ckT6Y7Ug$>
>>
>>
>>
>> Excerpting the docs added at the time:
>>
>>
>> https://github.com/apache/cassandra/commit/0a49d25078665da0ec30d9e69a036de163deb9c3#diff-6a259516d12c778b2b08ec408b884a88bf94d14a9f6dd8d943c6e4376288da6dR479-R480
>> [github.com]
>> <https://urldefense.com/v3/__https://github.com/apache/cassandra/commit/0a49d25078665da0ec30d9e69a036de163deb9c3*diff-6a259516d12c778b2b08ec408b884a88bf94d14a9f6dd8d943c6e4376288da6dR479-R480__;Iw!!HR4Zk_QqkEPgHrsGXw!3r8LAYhLcAAxsgwclK9zx0UVoovrOK9Z5xZawc5c3cSssAdfw0tACC3LtrLfNLPlhOs5HyEmNk9HPQUJug0ck-jJ1Y8$>
>>
>>
>>
>> ––––––
>>
>> A compact storage table is one defined with the COMPACT STORAGE option.
>> This option is only maintained for backward compatibility for definitions
>> created before CQL version 3 and shouldn't be used for new tables. 4.0
>> supports partially COMPACT STORAGE. There is no support for super column
>> family. Since Cassandra 3.0, compact storage tables have the exact same
>> layout internally than non compact ones (for the same schema obviously),
>> and declaring a table with this option creates limitations for the table
>> which are largely arbitrary (and exists for historical reasons).
>>
>>
>>
>> Amongst those limitations:
>>
>>
>>
>>   - A compact storage table cannot use collections nor static columns.
>>
>>   - If a compact storage table has at least one clustering column, then
>> it must have *exactly* one column outside of the primary key ones. This
>> implies you cannot add or remove columns after creation in particular.
>>
>>   - A compact storage table is limited in the indexes it can create, and
>> no materialized view can be created on it.
>>
>> ––––––
>>
>>
>>
>>
>>
>>
>>
>> On Mar 20, 2025, at 8:43 PM, Thomas Elliott <asf.telli...@gmail.com>
>> wrote:
>>
>>
>>
>>
>>
>> I'm now wondering how your Cassandra is starting with this table
>> configuration, Part of migrating from 3.x should have dropped COMPACT
>> STORAGE from your tables.
>>
>>
>>
>> Do you find any notice of exceptions with Cassandra starting up?    was
>> the Description of table1 that you shared from the current server or from
>> before the upgrade?
>>
>>
>>
>> .thomas
>>
>>
>>
>> On Thu, Mar 20, 2025 at 6:59 PM Thomas Elliott <asf.telli...@gmail.com>
>> wrote:
>>
>> William,
>>
>>
>>
>> Longtime listener, first time poster.
>>
>>
>>
>> Something stands out here for me.  This table uses COMPACT STORAGE which
>> is deprecated in Cassandra 4.x and it was a hold-over from the old thrift
>> API.  I'm sure that this is eliciting some gasps from the mailing list.
>>
>>
>>
>> You might actually be better served by creating a new keyspace and table,
>> using Leveled Compaction instead of STC and using LZ4 instead of snappy.
>>
>>
>>
>> After copying the data over to the new table, run a compaction and repair
>> on the table, and I expect that you should see the footprint decrease.
>>
>>
>>
>> What I'm concerned about is that you have Compact Storage using this
>> legacy thrift API which isn't compatible with 4.0, you also have wide
>> columns and a large partition, which exacerbates the compact storage issue.
>>
>>
>>
>> I have to keep coming back to that this is a single instance, single
>> replica and a small volume of data so I shouldn't be thinking of the
>> scalability of your data model.  That said, your data-footprint between
>> Cassandra .3.x and 4.x shouldn't be 10x difference.  It should be bigger
>> because 4.0 introduces additional meta-data but not THAT much bigger.
>>
>>
>>
>> I took a stab at a potential new schema for you and this might work
>> better...
>>
>>
>>
>> CREATE TABLE keyspace1.table1 (
>>
>>     blob1 blob,                          -- Partition key
>>
>>     blob2 blob,                          -- Clustering key
>>
>>     blue3 blob,                          -- Data column
>>
>>     PRIMARY KEY (blob1, blob2)           -- Composite primary key
>>
>> )
>>
>> WITH CLUSTERING ORDER BY (blob2 ASC)      -- Maintain clustering order
>>
>> AND compaction = {
>>
>>     'class': 'LeveledCompactionStrategy', -- Optimized for write-heavy 
>> workloads
>>
>>     'sstable_size_in_mb': '160'           -- Default size for LCS
>>
>> }
>>
>> AND compression = {
>>
>>     'class': 'LZ4Compressor',
>>
>>     'chunk_length_in_kb': '64'
>>
>> }
>>
>> AND caching = {
>>
>>     'keys': 'ALL',                        -- Cache all partition keys
>>
>>     'rows_per_partition': 'NONE'          -- Avoid caching rows to save 
>> memory
>>
>> }
>>
>> AND gc_grace_seconds = 864000
>>
>> AND default_time_to_live = 0
>>
>> AND speculative_retry = '99p'
>>
>> AND bloom_filter_fp_chance = 0.01
>>
>> AND crc_check_chance = 1.0
>>
>> AND memtable_flush_period_in_ms = 0;
>>
>>
>>
>>
>>
>> All the best,
>>
>> .thomas
>>
>>
>>
>> On Thu, Mar 20, 2025 at 8:36 AM Michalis Kotsiouros (EXT) via user <
>> user@cassandra.apache.org> wrote:
>>
>> Hello William,
>>
>> Does your data get updated? You mentioned in one email that you switched
>> from TTL 0 to 10 days. This might mean that some data are not deleted
>> because at some point in time were written with no TTL ( TTL set to 0).
>>
>> This has occurred in some systems in past.
>>
>> You may use the sstableexpiredblocker tool to check about this.
>>
>>
>>
>> BR
>>
>> MK
>>
>>
>>
>> *From:* William Crowell via user <user@cassandra.apache.org>
>> *Sent:* March 20, 2025 13:58
>> *To:* user@cassandra.apache.org
>> *Cc:* William Crowell <wcrow...@perforce.com>; Sartor Fabien (DIN) <
>> fabien.sar...@etat.ge.ch>; tasmaniede...@free.fr
>> *Subject:* Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>>
>>
>> Stephane and Fabian,
>>
>>
>>
>> Good morning and thank you for your replies.  Your English is fine.  I
>> just appreciate the help.
>>
>>
>>
>> Apache Cassandra was overkill for this application, and this was just
>> something handed to me.  The reason why this is an issue is because of the
>> limited amount of disk space they specified when creating the volume that
>> Cassandra resides on, and it cannot be expanded.  We still have plenty of
>> disk space but with this growth we will eventually run out.  Trying to be
>> proactive here.
>>
>>
>>
>> I do not have any snapshots when doing nodetable listsnapshots or backup
>> data in the backup folder.  It could be an actual increase in data size,
>> and I am not aware of any data being injected into the table by anyone
>> else.  That’s something to check into though.
>>
>>
>>
>> We initially created this table with Cassandra 3.x and then migrated to
>> 4.1.3.
>>
>>
>>
>> I will paste the table description below, and I have changed the table
>> and column names only.  I would never define a table like this with the
>> primary key being 2 blobs.  I would have at least created a uuid field as a
>> primary key.
>>
>>
>>
>> The compression ratio is not very good because there are 3 blob columns
>> in the table.  When I first looked at this, I felt
>> LeveledCompactionStrategy would have been a better fit instead of STCS.
>>
>>
>>
>> I do agree and see that droppable tombstones are below 5% which is
>> expected and should not significantly impact the data size.
>>
>>
>>
>> Have we recently deleted a large amount of data?  Not to my knowledge.
>> We define TTLs of 10 days which Cassandra deletes the records for us.
>>
>>
>>
>> How do you determine the data size before compaction ?  We look at the
>> disk space being used and notcied the increase : df -H
>>
>>
>>
>> Do these archives contain the full dataset before the update?  Yes they
>> do.
>>
>>
>>
>> I will try the suggestions you both mentioned.  Here is the table
>> definition :
>>
>>
>>
>> cqlsh:keyspace1> desc table table1
>>
>>
>>
>> /*
>>
>> Warning: Table keyspace1.table1 omitted because it has constructs not
>> compatible with CQL (was created via legacy API).
>>
>> Approximate structure, for reference:
>>
>> (this should not be used to reproduce this schema)
>>
>>
>>
>> CREATE TABLE keyspace1.table1 (
>>
>>     blob1 blob,
>>
>>     blob2 blob,
>>
>>     blue3 blob,
>>
>>     PRIMARY KEY (blob1, blob2)
>>
>> ) WITH COMPACT STORAGE
>>
>>     AND CLUSTERING ORDER BY (blob2 ASC)
>>
>>     AND additional_write_policy = '99p'
>>
>>     AND bloom_filter_fp_chance = 0.01
>>
>>     AND caching = {'keys': 'NONE', 'rows_per_partition': '120'}
>>
>>     AND cdc = false
>>
>>     AND comment = ''
>>
>>     AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>>
>>     AND compression = {'chunk_length_in_kb': '16', 'class':
>> 'org.apache.cassandra.io.compress.SnappyCompressor'}
>>
>>     AND memtable = 'default'
>>
>>     AND crc_check_chance = 1.0
>>
>>     AND default_time_to_live = 0
>>
>>     AND extensions = {}
>>
>>     AND gc_grace_seconds = 864000
>>
>>     AND max_index_interval = 2048
>>
>>     AND memtable_flush_period_in_ms = 0
>>
>>     AND min_index_interval = 128
>>
>>     AND read_repair = 'BLOCKING'
>>
>>     AND speculative_retry = '99p';
>>
>> */
>>
>>
>>
>> Regards,
>>
>>
>>
>> William Crowell
>>
>>
>>
>> *From: *Sartor Fabien (DIN) via user <user@cassandra.apache.org>
>> *Date: *Thursday, March 20, 2025 at 6:56 AM
>> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
>> *Cc: *Sartor Fabien (DIN) <fabien.sar...@etat.ge.ch>
>> *Subject: *RE: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>> Dear Wiliam,
>>
>>
>>
>> I'm also sorry for my poor english !
>>
>>
>>
>> Compression ratio 0.566… I guess it means If you have 1MB data on your
>> laptop, you get 0.566MB in Cassandra.
>>
>> If you put blobs in the table, it is ok!
>>
>> If it is plain english text, It seams low.
>>
>>
>>
>> I understand answer of below questions from Bowen:
>>
>>
>>
>> •             snapshots could've been created automatically,
>>
>> o             such as by dropping or truncating tables when
>> auto_snapshots is set to true: SET TO TRUE
>>
>> o             or compaction when snapshot_before_compaction is set to
>> true : SET TO FALSE
>>
>> •             backups, which could've been created automatically, e.g.
>> when incremental_backups is set to true: SET TO FALSE
>>
>> •             mixing repaired and unrepaired sstables, which is usually
>> caused by incremental repairs, even if it had only been ran once : no
>> incremental repairs done
>>
>> •             partially upgraded cluster, e.g. mixed Cassandra version in
>> the same cluster: Only 1 node un the cluster.
>>
>> •             token ring change (e.g. adding or removing nodes) without
>> "nodetool cleanup" : Only 1 node un the cluster.
>>
>> •             changes made to the compression table properties: No change
>> done
>>
>>
>>
>> I don't have the answers of these questions below from Bowen:
>>
>> •             do you have any snapshots/backup data in the folder ?
>>
>> o             You can try to run : nodetool listsnapshots
>>
>> •             actual increase in data size of business data ?
>>
>> o             maybe one developper use prod env to inject data and didn't
>> notice it ?
>>
>>
>>
>> I recently started working in Cassandra administration a few months ago.
>>
>> So I’m maybe wrong !
>>
>>
>>
>> My team recently migrated Cassandra from version 3 to 4.1.4.
>>
>> They didn't observe this behavior.
>>
>> All tables use the Leveled Compaction Strategy, and we do not perform
>> manual compactions.
>>
>>
>>
>> What I can see is that droppable tombstones are below 5%, which is
>> expected and should not significantly impact the data size.
>>
>> Do we agree with this statement?
>>
>> Have you recently deleted a large amount of data?
>>
>> Could you run the following command on the 80GB+ SSTable and some others?
>>
>>     *$CASSANDRA_HOME/tools/bin/sstablemetadata*
>>
>>
>>
>>
>>
>> An SSTable of 80GB+ seems very large.
>>
>> An SSTable ensures that you don't have duplicate data inside, meaning
>> this 80GB+ consists of useful data.
>>
>>
>>
>> So my question is: How do you determine the data size before compaction?
>>
>> I understand that you perform backups—could you check the backup archives
>> from before the update?
>>
>> Do these archives contain the full dataset before the update?
>>
>>
>>
>> Additionally, try parsing the business data and verify if all the data is
>> intact.
>>
>> You can use the following command to inspect a few records:
>>
>>
>>
>> *$CASSANDRA_HOME/tools/bin/sstabledump -d*
>>
>>
>>
>>
>>
>>
>>
>> Another possibility is to load all the SSTables into another Cassandra
>> instance using:
>>
>>
>>
>> *$CASSANDRA_HOME/bin/sstableloader*
>>
>>
>>
>> Then, check if you get the same data size.
>>
>>
>>
>> Thank you,
>>
>> Best regards,
>>
>> Fabien
>>
>>
>>
>>
>>
>> *De :* Tasmaniedemon <tasmaniede...@free.fr>
>> *Envoyé :* jeudi, 20 mars 2025 09:06
>> *À :* user@cassandra.apache.org
>> *Objet :* Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>>
>>
>> *PRUDENCE.* Ce message provient d'un expéditeur externe à l'Etat. Ne
>> cliquez sur les liens ou n'ouvrez les pièces jointes que si vous faites
>> entière confiance à cet expéditeur.
>>
>>
>>
>> Hi,
>>
>> Could you give more details about the use of tables and modeling about
>> this single node cassandra ?
>>
>> Have you began to use Cassandra with 3 version or have you already
>> migrate before from previous version ( 2.x) ?
>>
>> To be honest, i would suggest to use the last release avalable, and to
>> rebuild and relaad a fresh new cluster with a very low num_token ( and 3
>> nodes :-)
>>
>> May i ask you why only single node cassandra ? Scalability is not
>> intended ?
>>
>> Sorry for my poor english :-)
>>
>> Kind regards
>>
>> Stephane
>>
>>
>>
>>
>>
>> Le 19/03/2025 à 14:15, William Crowell via user a écrit :
>>
>> Bowen, Fabien, Stéphane, and Luciano,
>>
>>
>>
>> A bit more information here...
>>
>>
>>
>> We have not run incremental repairs, and we have not made any changes to
>> the compression properties on the tables.
>>
>>
>>
>> When we first started the database the TTL on the records was set to 0
>> but not it is set to 10 days.
>>
>>
>>
>> We do have one table in a keyspace that is occupying 84.1GB of disk space:
>>
>>
>>
>> ls -l */var/lib/cassandra/data/keyspace1/*table1
>>
>> …
>>
>> -rw-rw-r--. 1 xxxxxxxx xxxxxxxxx *84145170181 *Mar 18 08:28
>> nb-163033-big-Data.db
>>
>> …
>>
>>
>>
>> Regards,
>>
>>
>>
>> William Crowell
>>
>>
>>
>> *From: *William Crowell via user <user@cassandra.apache.org>
>> <user@cassandra.apache.org>
>> *Date: *Friday, March 14, 2025 at 10:53 AM
>> *To: *user@cassandra.apache.org <user@cassandra.apache.org>
>> <user@cassandra.apache.org>
>> *Cc: *William Crowell <wcrow...@perforce.com> <wcrow...@perforce.com>,
>> Bowen Song <bo...@bso.ng> <bo...@bso.ng>
>> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>> Bowen,
>>
>>
>>
>> This is just a single Cassandra node.  Unfortunately, I cannot get on the
>> box at the moment, but the following configuration is in cassandra.yaml:
>>
>>
>>
>> snapshot_before_compaction: false
>>
>> auto_snapshot: true
>>
>> incremental_backups: false
>>
>>
>>
>> The only other configuration parameter that had been changed other than
>> the keystore and truststore was num_tokens (default: 16):
>>
>>
>>
>> num_tokens: 256
>>
>>
>>
>> I also noticed the compression ratio on the largest table is not good:
>>  0.566085855123187
>>
>>
>>
>> Regards,
>>
>>
>>
>> William Crowell
>>
>>
>>
>> *From: *Bowen Song via user <user@cassandra.apache.org>
>> <user@cassandra.apache.org>
>> *Date: *Friday, March 14, 2025 at 10:13 AM
>> *To: *William Crowell via user <user@cassandra.apache.org>
>> <user@cassandra.apache.org>
>> *Cc: *Bowen Song <bo...@bso.ng> <bo...@bso.ng>
>> *Subject: *Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x
>> to 4.1.3
>>
>> A few suspects:
>>
>> * snapshots, which could've been created automatically, such as by
>> dropping or truncating tables when auto_snapshots is set to true, or
>> compaction when snapshot_before_compaction is set to true
>>
>> * backups, which could've been created automatically, e.g. when
>> incremental_backups is set to true
>>
>> * mixing repaired and unrepaired sstables, which is usually caused by
>> incremental repairs, even if it had only been ran once
>>
>> * partially upgraded cluster, e.g. mixed Cassandra version in the same
>> cluster
>>
>> * token ring change (e.g. adding or removing nodes) without "nodetool
>> cleanup"
>>
>> * actual increase in data size
>>
>> * changes made to the compression table properties
>>
>>
>>
>> To find the root cause, you will need to check the file/folder sizes to
>> find out what is using the extra disk space, and may also need to review
>> the cassandra.yaml file (or post it here with sensitive information
>> removed) and any actions you've made to the cluster prior to the first
>> appearance of the issue.
>>
>>
>>
>> Also, manually running major compactions is no advised.
>>
>> On 12/03/2025 20:26, William Crowell via user wrote:
>>
>> Hi.  A few months ago, I upgraded a single node Cassandra instance from
>> version 3 to 4.1.3.  This instance is not very large with about 15 to 20
>> gigabytes of data on version 3, but after the update it has went
>> substantially up to over 100gb.  I do a compaction once a week and take a
>> snapshot, but with the increase in data it makes the compaction a much
>> lengthier process.  I also did a sstableupate as part of the upgrade.  Any
>> reason for the increased size of the database on the file system?
>>
>>
>>
>> I am using the default STCS compaction strategy.  My “nodetool cfstats”
>> on a heavily used table looks like this:
>>
>>
>>
>> Keyspace : xxxxxxxx
>>
>>         Read Count: 48089
>>
>>         Read Latency: 12.52872569610514 ms
>>
>>         Write Count: 1616682825
>>
>>         Write Latency: 0.0067135265490310386 ms
>>
>>         Pending Flushes: 0
>>
>>                 Table: sometable
>>
>>                 SSTable count: 13
>>
>>                 Old SSTable count: 0
>>
>>                 Space used (live): 104005524836
>>
>>                 Space used (total): 104005524836
>>
>>                 Space used by snapshots (total): 0
>>
>>                 Off heap memory used (total): 116836824
>>
>>                 SSTable Compression Ratio: 0.566085855123187
>>
>>                 Number of partitions (estimate): 14277177
>>
>>                 Memtable cell count: 81033
>>
>>                 Memtable data size: 13899174
>>
>>                 Memtable off heap memory used: 0
>>
>>                 Memtable switch count: 13171
>>
>>                 Local read count: 48089
>>
>>                 Local read latency: NaN ms
>>
>>                 Local write count: 1615681213
>>
>>                 Local write latency: 0.005 ms
>>
>>                 Pending flushes: 0
>>
>>                 Percent repaired: 0.0
>>
>>                 Bytes repaired: 0.000KiB
>>
>>                 Bytes unrepaired: 170.426GiB
>>
>>                 Bytes pending repair: 0.000KiB
>>
>>                 Bloom filter false positives: 125
>>
>>                 Bloom filter false ratio: 0.00494
>>
>>                 Bloom filter space used: 24656936
>>
>>                 Bloom filter off heap memory used: 24656832
>>
>>                 Index summary off heap memory used: 2827608
>>
>>                 Compression metadata off heap memory used: 89352384
>>
>>                 Compacted partition minimum bytes: 73
>>
>>                 Compacted partition maximum bytes: 61214
>>
>>                 Compacted partition mean bytes: 11888
>>
>>                 Average live cells per slice (last five minutes): NaN
>>
>>                 Maximum live cells per slice (last five minutes): 0
>>
>>                 Average tombstones per slice (last five minutes): NaN
>>
>>                 Maximum tombstones per slice (last five minutes): 0
>>
>>                 Dropped Mutations: 0
>>
>>                 Droppable tombstone ratio: 0.04983
>>
>>
>>
>>
>>
>> *This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.*
>>
>>
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not click on links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>>
>>
>>
>> *This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.*
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not click on links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>>
>>
>>
>> *This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.*
>>
>>
>>
>> --
>> <image001.gif>
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not click on links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>>
>>
>>
>> *This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.*
>>
>>
>>
>>
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not click on links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>>
>>
>>
>> *This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.*
>>
>>
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not click on links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>>
>>
>>
>> *This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.*
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not click on links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>>
>>
>>
>> *This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.*
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not click on links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>>
>> This e-mail may contain information that is privileged or confidential.
>> If you are not the intended recipient, please delete the e-mail and any
>> attachments and notify us immediately.
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>

Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

Reply via email to