Re: (Full) compaction does not delete (all) old files

Alain RODRIGUEZ Wed, 01 Jun 2016 04:37:28 -0700

Hi,

About your main concern:


1. True those files should have been removed. Yet Cassandra 2.0 is no
longer supported, even more such an old version (2.0.6), so I think no one
is going to dig this issue. To fix it, upgrade will probably be enough.

I don't usually run manual compaction, and relied completely on Cassandra
> to automatically do it. A couple of days ago in preparation for an upgrade
> to Cassandra 2.1.11, I ran a manual, complete compaction


2. As you might know, sstables are immutable, meaning compacting, merging
row shards, has to be done somewhere else, not in place. Those -tmp- files
are the result of compactions ongoing basically. It is perfectly normal.
Yet '-tmp-' files are supposed to be removed once compaction is done.

3. Restarting the node will most probably solve your issue. To be sure to
indeed free disk space, make sure you have no snapshot of those old
sstables.

4. The advantage of DTCS is that data is not mixed per age. Meaning
Cassandra can drop a full expired sstable, without compacting. It sounds
like a good fit. Yet this compaction strategy is the most recent one and
some things are still being fixed. I still think it is safe to use it. Make
sure you read first:
https://labs.spotify.com/2014/12/18/date-tiered-compaction/ And/Or
http://www.datastax.com/dev/blog/datetieredcompactionstrategy

You also might want to have a look at https://github.com/jeffjirsa/twcs.

Some other off-topic, but maybe useful questions / info

A - Why do you need a manual compaction before upgrading? I really can't
see any reason for it.
B - Why upgrading to Cassandra 2.1.14 when 2.1.14 is available and brings
some more bug fixes (compared to 2.1.11)?
C - It is recommended to move to 2.0.last before going to 2.1.X. You might
run into some issue. Either make sure to test it works or go incrementally
2.0.6 --> 2.0.17 --> 2.1.14. I would probably do both. Test it and go
incrementally. I would not go with 2.0.6 --> 2.1.14 without testing it
first anyway.

Hope it is all clear and that a restart will solve your issue.

C*heers,

-----------------------
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-05-17 0:06 GMT+01:00 Dongfeng Lu <dlu66...@yahoo.com>:

> Forgive me if that has been answered somewhere, but I could not find a
> concise or clear answer.
>
> I am using Cassandra 2.0.6 on a 3 node cluster. I don't usually run manual
> compaction, and relied completely on Cassandra to automatically do it. A
> couple of days ago in preparation for an upgrade to Cassandra 2.1.11, I ran
> a manual, complete compaction. The compaction ran for many hours, but it
> did complete successfully, and the "load" in "nodetool status" dropped 80%.
> However, I did not see a big drop in disk usage, even after waiting for a
> couple of days. There are still many old data files left on the disk. For
> instance, here is a list of data files for one table.
>
> -bash-4.1$ ls -ltr *-Data.db
> -rw-r--r--  1 cassandra cassandra 36441245112 Jan 19 05:42
> keyspace-event_index-jb-620839-Data.db
> -rw-r--r--  1 cassandra cassandra 48117578123 Jan 25 05:17
> keyspace-event_index-jb-649329-Data.db
> -rw-r--r--  1 cassandra cassandra  8731574747 Jan 27 18:30
> keyspace-event_index-jb-662597-Data.db
> -rw-r--r--  1 cassandra cassandra   835204478 Feb  2 07:20
> keyspace-event_index-jb-670851-Data.db
> -rw-r--r--  1 cassandra cassandra    39496133 Feb  2 15:29
> keyspace-event_index-tmp-jb-672828-Data.db
> ... about 110 files listed here, removed for clarity ...
>
> -rw-r--r--  1 cassandra cassandra   149344563 May  9 20:53
> keyspace-event_index-tmp-jb-827472-Data.db
> -rw-r--r-- 11 cassandra cassandra 20149715779 May 15 04:18
> keyspace-event_index-jb-829601-Data.db
> -rw-r--r-- 11 cassandra cassandra  7153875910 May 15 11:15
> keyspace-event_index-jb-830446-Data.db
> -rw-r--r-- 11 cassandra cassandra  3051908121 May 16 03:08
> keyspace-event_index-jb-831112-Data.db
> -rw-r--r-- 11 cassandra cassandra  6109582092 May 16 06:11
> keyspace-event_index-jb-831709-Data.db
> -rw-r--r-- 11 cassandra cassandra  2922532233 May 16 07:14
> keyspace-event_index-jb-831873-Data.db
> -rw-r--r-- 11 cassandra cassandra  1766025989 May 16 08:31
> keyspace-event_index-jb-832111-Data.db
> -rw-r--r--  8 cassandra cassandra  2922259593 May 16 11:39
> keyspace-event_index-jb-832693-Data.db
> -rw-r--r--  8 cassandra cassandra  1224495235 May 16 11:50
> keyspace-event_index-jb-832764-Data.db
> -rw-r--r--  7 cassandra cassandra  2051385733 May 16 12:57
> keyspace-event_index-jb-832975-Data.db
> -rw-r--r--  6 cassandra cassandra   853824939 May 16 13:12
> keyspace-event_index-jb-833100-Data.db
> -rw-r--r--  5 cassandra cassandra   763243638 May 16 14:58
> keyspace-event_index-jb-833203-Data.db
> -rw-r--r--  3 cassandra cassandra    99076639 May 16 16:29
> keyspace-event_index-jb-833222-Data.db
> -rw-r--r--  2 cassandra cassandra   254935385 May 16 17:21
> keyspace-event_index-jb-833233-Data.db
> -rw-r--r--  2 cassandra cassandra    66006223 May 16 17:51
> keyspace-event_index-jb-833238-Data.db
> -rw-r--r--  1 cassandra cassandra    50204322 May 16 18:18
> keyspace-event_index-jb-833243-Data.db
> -rw-r--r--  2 cassandra cassandra    16078537 May 16 18:26
> keyspace-event_index-jb-833244-Data.db
>
> However, it looks to me that Cassandra knows that the first 115 files are
> old and are not really used to create snapshot. Here is the newly created
> snapshot.
>
> -bash-4.1$ ls -ltr snapshots/20160516-1800/*-Data.db
> -rw-r--r-- 11 cassandra cassandra 20149715779 May 15 04:18
> snapshots/20160516-1800/keyspace-event_index-jb-829601-Data.db
> -rw-r--r-- 11 cassandra cassandra  7153875910 May 15 11:15
> snapshots/20160516-1800/keyspace-event_index-jb-830446-Data.db
> -rw-r--r-- 11 cassandra cassandra  3051908121 May 16 03:08
> snapshots/20160516-1800/keyspace-event_index-jb-831112-Data.db
> -rw-r--r-- 11 cassandra cassandra  6109582092 May 16 06:11
> snapshots/20160516-1800/keyspace-event_index-jb-831709-Data.db
> -rw-r--r-- 11 cassandra cassandra  2922532233 May 16 07:14
> snapshots/20160516-1800/keyspace-event_index-jb-831873-Data.db
> -rw-r--r-- 11 cassandra cassandra  1766025989 May 16 08:31
> snapshots/20160516-1800/keyspace-event_index-jb-832111-Data.db
> -rw-r--r--  8 cassandra cassandra  2922259593 May 16 11:39
> snapshots/20160516-1800/keyspace-event_index-jb-832693-Data.db
> -rw-r--r--  8 cassandra cassandra  1224495235 May 16 11:50
> snapshots/20160516-1800/keyspace-event_index-jb-832764-Data.db
> -rw-r--r--  7 cassandra cassandra  2051385733 May 16 12:57
> snapshots/20160516-1800/keyspace-event_index-jb-832975-Data.db
> -rw-r--r--  6 cassandra cassandra   853824939 May 16 13:12
> snapshots/20160516-1800/keyspace-event_index-jb-833100-Data.db
> -rw-r--r--  5 cassandra cassandra   763243638 May 16 14:58
> snapshots/20160516-1800/keyspace-event_index-jb-833203-Data.db
> -rw-r--r--  3 cassandra cassandra    99076639 May 16 16:29
> snapshots/20160516-1800/keyspace-event_index-jb-833222-Data.db
> -rw-r--r--  2 cassandra cassandra   254935385 May 16 17:21
> snapshots/20160516-1800/keyspace-event_index-jb-833233-Data.db
> -rw-r--r--  2 cassandra cassandra    66006223 May 16 17:51
> snapshots/20160516-1800/keyspace-event_index-jb-833238-Data.db
> -rw-r--r--  2 cassandra cassandra    16336415 May 16 17:59
> snapshots/20160516-1800/keyspace-event_index-jb-833239-Data.db
> -rw-r--r--  2 cassandra cassandra     1947026 May 16 18:00
> snapshots/20160516-1800/keyspace-event_index-jb-833240-Data.db
> -bash-4.1$
>
> You can see that only files dated "May 15 04:18" or later exist in the
> snapshot folder.
>
> My questions:
>
> 1. I believe Cassandra should have deleted all old 115 data files. What
> could have prevented those files being deleted? What can I do to make sure
> old files will be deleted in future compactions?
> 2. What are those files with "-tmp-"? What is the implication of their
> existance? Does it mean a compaction failed?
> 3. Since Cassandra knows what files are actually used, is there some
> utility that I can use to delete those old files? I can delete them
> manually, but that would be error-prone.
> 4. The table uses SizeTieredCompactionStrategy, and contains data with a
> TTL of 8 days. Will switching to DateTieredCompactionStrategy after
> upgrading to 2.1.11 offer much better compaction performance?
>
> Thanks,
> Dongfeng
>

Re: (Full) compaction does not delete (all) old files

Reply via email to