You will DEFINITELY not remove sstables obsoleted by compaction if they are
being streamed out to neighbors. It would also not surprise me that if you
have something holding a background reference to one of the sstables in the
oldest/older compaction transaction logs, that the whole process may block
waiting on the tidier to clean that up.

Things that may hold references:
- Validation compactions (repair)
- Index build/rebuild
- Streaming (repair, bootstrap, move, decom)

If you have repairs running, you can try pausing/cancelling them and/or
stopping validation/index_build compactions.



On Tue, Apr 4, 2023 at 2:29 PM Gil Ganz <gilg...@gmail.com> wrote:

> If it was just one instance I would just bounce it but the thing is this
> happens when we join/remove nodes, and we have a lot of servers with this
> issue (while before the join/remove we are on ~50% disk usage).
> We found ourselves fighting with compaction to make sure we don't run out
> of space.
> Will open a ticket, thanks.
>
>
> On Wed, Apr 5, 2023 at 12:10 AM Jeff Jirsa <jji...@gmail.com> wrote:
>
>> If you restart the node, it'll process/purge those compaction logs on
>> startup, but you want them to purge/process now.
>>
>> I genuinely dont know when the tidier runs, but it's likely the case that
>> you're too busy compaction to purge (though I don't know what exactly "too
>> busy" means).
>>
>> Since you're close to 95% disk full, bounce one instance at a time to
>> recover the space, but we probably need a JIRA to understand exactly what's
>> blocking the tidier from running.
>>
>>
>>
>> On Tue, Apr 4, 2023 at 1:55 PM Gil Ganz <gilg...@gmail.com> wrote:
>>
>>> More information - from another node in the cluster
>>>
>>> I can see many txn files although I only have two compactions running.
>>> [user@server808 new_table-44263b406bf111ed8bd9df80ace3677a]# ls -l *txn*
>>> -rw-r--r-- 1 cassandra cassandra 613 Apr  4 05:26
>>> nb_txn_compaction_09e3aa40-d2a7-11ed-b76b-3b279f6334bc.log
>>> -rw-r--r-- 1 cassandra cassandra 461 Apr  4 10:17
>>> nb_txn_compaction_11433360-d265-11ed-b76b-3b279f6334bc.log
>>> -rw-r--r-- 1 cassandra cassandra 463 Apr  4 09:48
>>> nb_txn_compaction_593e5320-d265-11ed-b76b-3b279f6334bc.log
>>> -rw-r--r-- 1 cassandra cassandra 614 Apr  3 22:47
>>> nb_txn_compaction_701d62d0-d264-11ed-b76b-3b279f6334bc.log
>>> -rw-r--r-- 1 cassandra cassandra 136 Apr  3 22:27
>>> nb_txn_compaction_bb770b50-d26e-11ed-b76b-3b279f6334bc.log
>>> -rw-r--r-- 1 cassandra cassandra 463 Apr  4 09:23
>>> nb_txn_compaction_ce51bfe0-d264-11ed-b76b-3b279f6334bc.log
>>> -rw-r--r-- 1 cassandra cassandra 134 Apr  4 10:31
>>> nb_txn_compaction_d17c7380-d2d3-11ed-b76b-3b279f6334bc.log
>>> -rw-r--r-- 1 cassandra cassandra 464 Apr  4 09:24
>>> nb_txn_compaction_ed7fc650-d264-11ed-b76b-3b279f6334bc.log
>>> -rw-r--r-- 1 cassandra cassandra 613 Apr  3 22:54
>>> nb_txn_compaction_f456f3b0-d271-11ed-b76b-3b279f6334bc.log
>>>
>>> Let's take for example the one from "Apr  4 09:24"
>>> I can see the matching log message in system.log
>>>
>>> INFO  [CompactionExecutor:142085] 2023-04-04 09:24:29,892
>>> CompactionTask.java:241 - Compacted (ed7fc650-d264-11ed-b76b-3b279f6334bc)
>>> 2 sstables to
>>> [/var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big,]
>>> to level=0.  362.987GiB to 336.323GiB (~92% of original) in 43,625,742ms.
>>> Read Throughput = 8.520MiB/s, Write Throughput = 7.894MiB/s, Row Throughput
>>> = ~-11,482/s.  3,755,353,838 total partitions merged to 3,479,484,261.
>>> Partition merge counts were {1:3203614684, 2:275869577, }
>>>
>>>
>>> [user@server808 new_table-44263b406bf111ed8bd9df80ace3677a]# cat
>>> nb_txn_compaction_ed7fc650-d264-11ed-b76b-3b279f6334bc.log
>>>
>>> ADD:[/var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-,0,8][2633027732]
>>>
>>> REMOVE:[/var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-,1680543675071,8][4190572643]
>>>
>>> REMOVE:[/var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12109-big-,1680554352704,8][3101929253]
>>> COMMIT:[,0,0][2613697770]
>>>
>>> I would expect sstable 10334 to be gone, since compaction finished an
>>> hour ago, but files are still there.
>>>
>>>
>>> [user@server808 new_table-44263b406bf111ed8bd9df80ace3677a]# ls -l
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-*
>>> -rw-r--r-- 1 cassandra cassandra    315582480 Mar 24 16:46
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-CompressionInfo.db
>>> -rw-r--r-- 1 cassandra cassandra 361124597166 Mar 24 16:46
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Data.db
>>> -rw-r--r-- 1 cassandra cassandra           10 Mar 24 16:46
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Digest.crc32
>>> -rw-r--r-- 1 cassandra cassandra   4316334488 Mar 24 16:46
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Filter.db
>>> -rw-r--r-- 1 cassandra cassandra 283651305837 Mar 24 16:46
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Index.db
>>> -rw-r--r-- 1 cassandra cassandra        11934 Mar 24 16:46
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Statistics.db
>>> -rw-r--r-- 1 cassandra cassandra    763353245 Apr  3 17:41
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Summary.db
>>> -rw-r--r-- 1 cassandra cassandra           92 Mar 24 16:46
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-TOC.txt
>>> [user@server808 new_table-44263b406bf111ed8bd9df80ace3677a]# ls -l
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-*
>>> -rw-r--r-- 1 cassandra cassandra    315582480 Apr  4 09:24
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-CompressionInfo.db
>>> -rw-r--r-- 1 cassandra cassandra 361124597166 Apr  4 09:24
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-Data.db
>>> -rw-r--r-- 1 cassandra cassandra           10 Apr  4 09:24
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-Digest.crc32
>>> -rw-r--r-- 1 cassandra cassandra   4316334488 Apr  4 09:24
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-Filter.db
>>> -rw-r--r-- 1 cassandra cassandra 283651305837 Apr  4 09:24
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-Index.db
>>> -rw-r--r-- 1 cassandra cassandra        11934 Apr  4 09:24
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-Statistics.db
>>> -rw-r--r-- 1 cassandra cassandra    709470839 Apr  4 09:41
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-Summary.db
>>> -rw-r--r-- 1 cassandra cassandra           92 Apr  4 09:24
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-12142-big-TOC.txt
>>>
>>>
>>> Disk space  and du command shows the following :
>>>
>>> [user@server808 disk1]# df -h
>>> /dev/mapper/data_vg-cass_lv      7.0T  6.6T  411G  95%
>>> /var/lib/cassandra/data/disk1
>>> [user@server808 disk1]# du . -h|grep T
>>> 6.6T    ./kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a
>>> 6.6T    ./kt_ks
>>> 6.6T    .
>>>
>>>
>>> If i run sstableutil to clean things up, df command still shows space as
>>> occupied, but now du shows reduction, and files appear in lsof output
>>>
>>> [user@server808 new_table-44263b406bf111ed8bd9df80ace3677a]#
>>> sstableutil -c -v kt_ks new_table
>>> Warning: Maximum heap size rounded up to 512 MB
>>> WARN  10:52:18,106 memtable_cleanup_threshold has been deprecated and
>>> should be removed from cassandra.yaml
>>> Cleaning up...
>>> [user@server808 new_table-44263b406bf111ed8bd9df80ace3677a]# df -h
>>> /dev/mapper/data_vg-cass_lv      7.0T  6.6T  432G  94%
>>> /var/lib/cassandra/data/disk1
>>> [user@server808 new_table-44263b406bf111ed8bd9df80ace3677a]# ls -l *txn*
>>> ls: cannot access *txn*: No such file or directory
>>> [user@server808 disk1]# du . -h|grep T
>>> 3.8T    ./kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a
>>> 3.8T    ./kt_ks
>>> 3.8T    .
>>>
>>>
>>> lsof|grep deleted output contains many rows like this  :
>>>
>>> LocalPool 38608 40310 cassandra  879r      REG              253,3
>>> 283651305837  4297002780
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Index.db
>>> (deleted)
>>> LocalPool 38608 40310 cassandra 1061r      REG              253,3
>>> 361124597166  4297002778
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Data.db
>>> (deleted)
>>> OptionalT 38608 40352 cassandra  879r      REG              253,3
>>> 283651305837  4297002780
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Index.db
>>> (deleted)
>>> OptionalT 38608 40352 cassandra 1061r      REG              253,3
>>> 361124597166  4297002778
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Data.db
>>> (deleted
>>> logback-4 38608 40439 cassandra 1061r      REG              253,3
>>> 361124597166  4297002778
>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-10334-big-Data.db
>>> (deleted)
>>>
>>>
>>> I will add that this cluster contains a single table with twcs.
>>>
>>> gil
>>>
>>>
>>> On Tue, Apr 4, 2023 at 10:14 AM Gil Ganz <gilg...@gmail.com> wrote:
>>>
>>>> Hey
>>>> I have a 4.0.7 cluster in which I see weird behavior.
>>>> I expect that once compaction finishes, the old sstables that were part
>>>> of the compaction set will be deleted, but it appears they are deleted much
>>>> later, thus causing space issues.
>>>>
>>>> For example this is what I have in the log, only one compaction
>>>> finished, which took as input 3 sstables, but I can see many sstables were
>>>> deleted afterwards:
>>>>
>>>>
>>>> INFO  [CompactionExecutor:17344] 2023-04-04 04:39:07,345
>>>> CompactionTask.java:241 - Compacted (489ddc80-d282-11ed-b4fb-d9d812cda140)
>>>> 3 sstables to
>>>> [/var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-5685-big,]
>>>> to level=0.  140.672GiB to 134.554GiB (~95% of original) in 13,894,920ms.
>>>> Read Throughput = 10.367MiB/s, Write Throughput = 9.916MiB/s, Row
>>>> Throughput = ~-80,450/s.  1,425,146,188 total partitions merged to
>>>> 1,378,482,110.  Partition merge counts were {1:1339149089, 2:32001964,
>>>> 3:7331057, }
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:08,151 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-5546-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:09,399 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-5424-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:10,563 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-5677-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:12,143 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-3360-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:12,754 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-5604-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:13,328 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-5572-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:13,340 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-5684-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:13,363 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-5630-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:14,492 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-4991-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:16,338 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-2537-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:18,287 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-4171-big
>>>> INFO  [NonPeriodicTasks:1] 2023-04-04 04:39:21,467 SSTable.java:111 -
>>>> Deleting sstable:
>>>> /var/lib/cassandra/data/disk1/kt_ks/new_table-44263b406bf111ed8bd9df80ace3677a/nb-1767-big
>>>>
>>>>
>>>> Am I missing something about how/when sstables are deleted? Is there
>>>> something I can do to control when that delete happens?
>>>>
>>>> gil
>>>>
>>>

Reply via email to