Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

2025-03-14 Thread William Crowell via user
Bowen,

This is just a single Cassandra node.  Unfortunately, I cannot get on the box 
at the moment, but the following configuration is in cassandra.yaml:

snapshot_before_compaction: false
auto_snapshot: true
incremental_backups: false

The only other configuration parameter that had been changed other than the 
keystore and truststore was num_tokens (default: 16):

num_tokens: 256

I also noticed the compression ratio on the largest table is not good:  
0.566085855123187

Regards,

William Crowell

From: Bowen Song via user 
Date: Friday, March 14, 2025 at 10:13 AM
To: William Crowell via user 
Cc: Bowen Song 
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

A few suspects:

* snapshots, which could've been created automatically, such as by dropping or 
truncating tables when auto_snapshots is set to true, or compaction when 
snapshot_before_compaction is set to true

* backups, which could've been created automatically, e.g. when 
incremental_backups is set to true

* mixing repaired and unrepaired sstables, which is usually caused by 
incremental repairs, even if it had only been ran once

* partially upgraded cluster, e.g. mixed Cassandra version in the same cluster

* token ring change (e.g. adding or removing nodes) without "nodetool cleanup"

* actual increase in data size

* changes made to the compression table properties



To find the root cause, you will need to check the file/folder sizes to find 
out what is using the extra disk space, and may also need to review the 
cassandra.yaml file (or post it here with sensitive information removed) and 
any actions you've made to the cluster prior to the first appearance of the 
issue.



Also, manually running major compactions is no advised.
On 12/03/2025 20:26, William Crowell via user wrote:
Hi.  A few months ago, I upgraded a single node Cassandra instance from version 
3 to 4.1.3.  This instance is not very large with about 15 to 20 gigabytes of 
data on version 3, but after the update it has went substantially up to over 
100gb.  I do a compaction once a week and take a snapshot, but with the 
increase in data it makes the compaction a much lengthier process.  I also did 
a sstableupate as part of the upgrade.  Any reason for the increased size of 
the database on the file system?

I am using the default STCS compaction strategy.  My “nodetool cfstats” on a 
heavily used table looks like this:

Keyspace : 
Read Count: 48089
Read Latency: 12.52872569610514 ms
Write Count: 1616682825
Write Latency: 0.0067135265490310386 ms
Pending Flushes: 0
Table: sometable
SSTable count: 13
Old SSTable count: 0
Space used (live): 104005524836
Space used (total): 104005524836
Space used by snapshots (total): 0
Off heap memory used (total): 116836824
SSTable Compression Ratio: 0.566085855123187
Number of partitions (estimate): 14277177
Memtable cell count: 81033
Memtable data size: 13899174
Memtable off heap memory used: 0
Memtable switch count: 13171
Local read count: 48089
Local read latency: NaN ms
Local write count: 1615681213
Local write latency: 0.005 ms
Pending flushes: 0
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 170.426GiB
Bytes pending repair: 0.000KiB
Bloom filter false positives: 125
Bloom filter false ratio: 0.00494
Bloom filter space used: 24656936
Bloom filter off heap memory used: 24656832
Index summary off heap memory used: 2827608
Compression metadata off heap memory used: 89352384
Compacted partition minimum bytes: 73
Compacted partition maximum bytes: 61214
Compacted partition mean bytes: 11888
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
Droppable tombstone ratio: 0.04983



This e-mail may contain information that is privileged or confidential. If you 
are not the intended recipient, please delete the e-mail and any attachments 
and notify us immediately.


CAUTION: This email originated from outside of the organization. Do not click 
on links or open attachments unless you recognize the sender and know the 
content is safe.



This e-mail may contain information that is privileged or confidential. If you 
are not the intended recipient, please delete the e-mail and any atta

Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

2025-03-14 Thread William Crowell via user
Stéphane
We do not do any repairs and maybe that is the issue.  We do a once weekly 
compaction.

Regards,

William Crowell


From: crystallo...@gmail.com 
Date: Friday, March 14, 2025 at 5:35 AM
To: user@cassandra.apache.org 
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3
You don't often get email from crystallo...@gmail.com. Learn why this is 
important

Hi

Have you some use of incremental  repair ?

Kind regards

Stéphane


Le 14/03/2025 à 03:37, Luciano Greiner a écrit :
As much as sstable files are immutable, there are operations that can delete 
them, such as compactions (merges) and upgrades (upgradesstables - you possibly 
ran this in your upgrade).

Even though snapshots are hardlinks, when the original sstable file get 
deleted, it will actually behave like a copy of the old file as it will keep 
pointing to the old inodes.

Luciano Greiner

On Thu, Mar 13, 2025 at 11:23 PM William Crowell 
mailto:wcrow...@perforce.com>> wrote:
Luciano,

That is very possible.  Any reasons why the increased disk space from version 3 
to 4?  Did anything in particular change that would affect disk space?

Thank you for your reply,

William Crowell

From: Luciano Greiner 
mailto:luciano.grei...@gmail.com>>
Date: Thursday, March 13, 2025 at 10:21 PM
To: user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Cc: William Crowell mailto:wcrow...@perforce.com>>
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3
You don't often get email from 
luciano.grei...@gmail.com. Learn why this is 
important
Haven't you forgot to clean some snapshots ?

Luciano Greiner



On Thu, Mar 13, 2025 at 11:18 PM William Crowell via user 
mailto:user@cassandra.apache.org>> wrote:
Hi,

Is this mailing list still active?

Thanks.

From: William Crowell via user 
mailto:user@cassandra.apache.org>>
Date: Wednesday, March 12, 2025 at 4:42 PM
To: user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Cc: William Crowell mailto:wcrow...@perforce.com>>
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3
I also forgot to include we do compaction once a week.

Hi.  A few months ago, I upgraded a single node Cassandra instance from version 
3 to 4.1.3.  This instance is not very large with about 15 to 20 gigabytes of 
data on version 3, but after the update it has went substantially up to over 
100gb.  I do a compaction once a week and take a snapshot, but with the 
increase in data it makes the compaction a much lengthier process.  I also did 
a sstableupate as part of the upgrade.  Any reason for the increased size of 
the database on the file system?

I am using the default STCS compaction strategy.  My “nodetool cfstats” on a 
heavily used table looks like this:

Keyspace : 
Read Count: 48089
Read Latency: 12.52872569610514 ms
Write Count: 1616682825
Write Latency: 0.0067135265490310386 ms
Pending Flushes: 0
Table: sometable
SSTable count: 13
Old SSTable count: 0
Space used (live): 104005524836
Space used (total): 104005524836
Space used by snapshots (total): 0
Off heap memory used (total): 116836824
SSTable Compression Ratio: 0.566085855123187
Number of partitions (estimate): 14277177
Memtable cell count: 81033
Memtable data size: 13899174
Memtable off heap memory used: 0
Memtable switch count: 13171
Local read count: 48089
Local read latency: NaN ms
Local write count: 1615681213
Local write latency: 0.005 ms
Pending flushes: 0
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 170.426GiB
Bytes pending repair: 0.000KiB
Bloom filter false positives: 125
Bloom filter false ratio: 0.00494
Bloom filter space used: 24656936
Bloom filter off heap memory used: 24656832
Index summary off heap memory used: 2827608
Compression metadata off heap memory used: 89352384
Compacted partition minimum bytes: 73
Compacted partition maximum bytes: 61214
Compacted partition mean bytes: 11888
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
Droppable

Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

2025-03-14 Thread crystallo...@gmail.com

Hi

Have you some use of incremental  repair ?

Kind regards

Stéphane


Le 14/03/2025 à 03:37, Luciano Greiner a écrit :
As much as sstable files are immutable, there are operations that can 
delete them, such as compactions (merges) and upgrades 
(upgradesstables - you possibly ran this in your upgrade).


Even though snapshots are hardlinks, when the original sstable file 
get deleted, it will actually behave like a copy of the old file as it 
will keep pointing to the old inodes.


Luciano Greiner

On Thu, Mar 13, 2025 at 11:23 PM William Crowell 
 wrote:


Luciano,

That is very possible.  Any reasons why the increased disk space
from version 3 to 4?  Did anything in particular change that would
affect disk space?

Thank you for your reply,

William Crowell

*From: *Luciano Greiner 
*Date: *Thursday, March 13, 2025 at 10:21 PM
*To: *user@cassandra.apache.org 
*Cc: *William Crowell 
*Subject: *Re: Increased Disk Usage After Upgrading From Cassandra
3.x.x to 4.1.3




You don't often get email from luciano.grei...@gmail.com. Learn
why this is important 



Haven't you forgot to clean some snapshots ?

Luciano Greiner

On Thu, Mar 13, 2025 at 11:18 PM William Crowell via user
 wrote:

Hi,

Is this mailing list still active?

Thanks.

*From: *William Crowell via user 
*Date: *Wednesday, March 12, 2025 at 4:42 PM
*To: *user@cassandra.apache.org 
*Cc: *William Crowell 
*Subject: *Re: Increased Disk Usage After Upgrading From
Cassandra 3.x.x to 4.1.3

I also forgot to include we do compaction once a week.

Hi.  A few months ago, I upgraded a single node Cassandra
instance from version 3 to 4.1.3.  This instance is not very
large with about 15 to 20 gigabytes of data on version 3, but
after the update it has went substantially up to over 100gb. 
I do a compaction once a week and take a snapshot, but with
the increase in data it makes the compaction a much lengthier
process.  I also did a sstableupate as part of the upgrade. 
Any reason for the increased size of the database on the file
system?

I am using the default STCS compaction strategy. My “nodetool
cfstats” on a heavily used table looks like this:

Keyspace : 

        Read Count: 48089

        Read Latency: 12.52872569610514 ms

        Write Count: 1616682825

        Write Latency: 0.0067135265490310386 ms

        Pending Flushes: 0

                Table: sometable

                SSTable count: 13

                Old SSTable count: 0

                Space used (live): 104005524836

                Space used (total): 104005524836

                Space used by snapshots (total): 0

                Off heap memory used (total): 116836824

                SSTable Compression Ratio: 0.566085855123187

                Number of partitions (estimate): 14277177

                Memtable cell count: 81033

                Memtable data size: 13899174

                Memtable off heap memory used: 0

                Memtable switch count: 13171

                Local read count: 48089

                Local read latency: NaN ms

                Local write count: 1615681213

                Local write latency: 0.005 ms

                Pending flushes: 0

                Percent repaired: 0.0

                Bytes repaired: 0.000KiB

                Bytes unrepaired: 170.426GiB

                Bytes pending repair: 0.000KiB

                Bloom filter false positives: 125

                Bloom filter false ratio: 0.00494

                Bloom filter space used: 24656936

                Bloom filter off heap memory used: 24656832

                Index summary off heap memory used: 2827608

                Compression metadata off heap memory used:
89352384

                Compacted partition minimum bytes: 73

                Compacted partition maximum bytes: 61214

                Compacted partition mean bytes: 11888

                Average live cells per slice (last five
minutes): NaN

                Maximum live cells per slice (last five
minutes): 0

                Average tombstones per slice (last five
minutes): NaN

                Maximum tombstones per slice (last five
minutes): 0

                Dropped Mutations: 0

                Droppable tombstone ratio: 0.04983

/This e-mail may contain information that is privileged or

Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

2025-03-14 Thread Bowen Song via user

A few suspects:

* snapshots, which could've been created automatically, such as by 
dropping or truncating tables when auto_snapshots is set to true, or 
compaction when snapshot_before_compaction is set to true


* backups, which could've been created automatically, e.g. when 
incremental_backups is set to true


* mixing repaired and unrepaired sstables, which is usually caused by 
incremental repairs, even if it had only been ran once


* partially upgraded cluster, e.g. mixed Cassandra version in the same 
cluster


* token ring change (e.g. adding or removing nodes) without "nodetool 
cleanup"


* actual increase in data size

* changes made to the compression table properties


To find the root cause, you will need to check the file/folder sizes to 
find out what is using the extra disk space, and may also need to review 
the cassandra.yaml file (or post it here with sensitive information 
removed) and any actions you've made to the cluster prior to the first 
appearance of the issue.



Also, manually running major compactions is no advised.

On 12/03/2025 20:26, William Crowell via user wrote:


Hi.  A few months ago, I upgraded a single node Cassandra instance 
from version 3 to 4.1.3.  This instance is not very large with about 
15 to 20 gigabytes of data on version 3, but after the update it has 
went substantially up to over 100gb.  I do a compaction once a week 
and take a snapshot, but with the increase in data it makes the 
compaction a much lengthier process.  I also did a sstableupate as 
part of the upgrade.  Any reason for the increased size of the 
database on the file system?


I am using the default STCS compaction strategy.  My “nodetool 
cfstats” on a heavily used table looks like this:


Keyspace : 

    Read Count: 48089

    Read Latency: 12.52872569610514 ms

    Write Count: 1616682825

    Write Latency: 0.0067135265490310386 ms

    Pending Flushes: 0

            Table: sometable

            SSTable count: 13

            Old SSTable count: 0

            Space used (live): 104005524836

            Space used (total): 104005524836

            Space used by snapshots (total): 0

            Off heap memory used (total): 116836824

            SSTable Compression Ratio: 0.566085855123187

            Number of partitions (estimate): 14277177

            Memtable cell count: 81033

            Memtable data size: 13899174

            Memtable off heap memory used: 0

            Memtable switch count: 13171

            Local read count: 48089

            Local read latency: NaN ms

            Local write count: 1615681213

            Local write latency: 0.005 ms

            Pending flushes: 0

            Percent repaired: 0.0

            Bytes repaired: 0.000KiB

            Bytes unrepaired: 170.426GiB

            Bytes pending repair: 0.000KiB

            Bloom filter false positives: 125

            Bloom filter false ratio: 0.00494

            Bloom filter space used: 24656936

            Bloom filter off heap memory used: 24656832

            Index summary off heap memory used: 2827608

            Compression metadata off heap memory used: 89352384

            Compacted partition minimum bytes: 73

            Compacted partition maximum bytes: 61214

            Compacted partition mean bytes: 11888

            Average live cells per slice (last five minutes): NaN

            Maximum live cells per slice (last five minutes): 0

            Average tombstones per slice (last five minutes): NaN

            Maximum tombstones per slice (last five minutes): 0

            Dropped Mutations: 0

            Droppable tombstone ratio: 0.04983


This e-mail may contain information that is privileged or 
confidential. If you are not the intended recipient, please delete the 
e-mail and any attachments and notify us immediately.




Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

2025-03-14 Thread William Crowell via user
Sartor and Stéphane,

Thank you for your replies.  We will check this and get back with you.

Regards,

William Crowell

From: Sartor Fabien (DIN) via user 
Date: Friday, March 14, 2025 at 6:03 AM
To: user@cassandra.apache.org 
Cc: Sartor Fabien (DIN) 
Subject: RE: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3
Dear William,

As Luciano mentioned previously, could you check the snapshot folder?
To know where the data is stored, check the value of data_file_directories in 
the cassandra.yaml file.
By default, it is located in the $CASSANDRA_HOME/data/data directory.

You can then browse all snapshots with the command: find . -iname snapshots 
-exec du -h {} \;

Best regards,
Fabien

De : William Crowell via user 
Envoyé : vendredi, 14 mars 2025 10:51
À : user@cassandra.apache.org
Cc : William Crowell 
Objet : Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3


PRUDENCE. Ce message provient d'un expéditeur externe à l'Etat. Ne cliquez sur 
les liens ou n'ouvrez les pièces jointes que si vous faites entière confiance à 
cet expéditeur.


Stéphane
We do not do any repairs and maybe that is the issue.  We do a once weekly 
compaction.

Regards,

William Crowell


From: crystallo...@gmail.com 
mailto:crystallo...@gmail.com>>
Date: Friday, March 14, 2025 at 5:35 AM
To: user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3
You don't often get email from 
crystallo...@gmail.com. Learn why this is 
important

Hi

Have you some use of incremental  repair ?

Kind regards

Stéphane


Le 14/03/2025 à 03:37, Luciano Greiner a écrit :
As much as sstable files are immutable, there are operations that can delete 
them, such as compactions (merges) and upgrades (upgradesstables - you possibly 
ran this in your upgrade).

Even though snapshots are hardlinks, when the original sstable file get 
deleted, it will actually behave like a copy of the old file as it will keep 
pointing to the old inodes.

Luciano Greiner

On Thu, Mar 13, 2025 at 11:23 PM William Crowell 
mailto:wcrow...@perforce.com>> wrote:
Luciano,

That is very possible.  Any reasons why the increased disk space from version 3 
to 4?  Did anything in particular change that would affect disk space?

Thank you for your reply,

William Crowell

From: Luciano Greiner 
mailto:luciano.grei...@gmail.com>>
Date: Thursday, March 13, 2025 at 10:21 PM
To: user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Cc: William Crowell mailto:wcrow...@perforce.com>>
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3
You don't often get email from 
luciano.grei...@gmail.com. Learn why this is 
important
Haven't you forgot to clean some snapshots ?

Luciano Greiner



On Thu, Mar 13, 2025 at 11:18 PM William Crowell via user 
mailto:user@cassandra.apache.org>> wrote:
Hi,

Is this mailing list still active?

Thanks.

From: William Crowell via user 
mailto:user@cassandra.apache.org>>
Date: Wednesday, March 12, 2025 at 4:42 PM
To: user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Cc: William Crowell mailto:wcrow...@perforce.com>>
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3
I also forgot to include we do compaction once a week.

Hi.  A few months ago, I upgraded a single node Cassandra instance from version 
3 to 4.1.3.  This instance is not very large with about 15 to 20 gigabytes of 
data on version 3, but after the update it has went substantially up to over 
100gb.  I do a compaction once a week and take a snapshot, but with the 
increase in data it makes the compaction a much lengthier process.  I also did 
a sstableupate as part of the upgrade.  Any reason for the increased size of 
the database on the file system?

I am using the default STCS compaction strategy.  My “nodetool cfstats” on a 
heavily used table looks like this:

Keyspace : 
Read Count: 48089
Read Latency: 12.52872569610514 ms
Write Count: 1616682825
Write Latency: 0.0067135265490310386 ms
Pending Flushes: 0
Table: sometable
SSTable count: 13
Old SSTable count: 0
Space used (live): 104005524836
Space used (total): 104005524836
Space used by snapshots (total): 0
Off heap memory used (total): 116836824
SSTable Compression Ratio: 0.566085855123187
Number of partitions (estimate): 14277177
Memtable cell count: 81033
Memtable data size: 13899174
Memtable off heap memory used: 0