Forgive me if that has been answered somewhere, but I could not find a concise or clear answer.
I am using Cassandra 2.0.6 on a 3 node cluster. I don't usually run manual compaction, and relied completely on Cassandra to automatically do it. A couple of days ago in preparation for an upgrade to Cassandra 2.1.11, I ran a manual, complete compaction. The compaction ran for many hours, but it did complete successfully, and the "load" in "nodetool status" dropped 80%. However, I did not see a big drop in disk usage, even after waiting for a couple of days. There are still many old data files left on the disk. For instance, here is a list of data files for one table. -bash-4.1$ ls -ltr *-Data.db -rw-r--r-- 1 cassandra cassandra 36441245112 Jan 19 05:42 keyspace-event_index-jb-620839-Data.db -rw-r--r-- 1 cassandra cassandra 48117578123 Jan 25 05:17 keyspace-event_index-jb-649329-Data.db -rw-r--r-- 1 cassandra cassandra 8731574747 Jan 27 18:30 keyspace-event_index-jb-662597-Data.db -rw-r--r-- 1 cassandra cassandra 835204478 Feb 2 07:20 keyspace-event_index-jb-670851-Data.db -rw-r--r-- 1 cassandra cassandra 39496133 Feb 2 15:29 keyspace-event_index-tmp-jb-672828-Data.db ... about 110 files listed here, removed for clarity ... -rw-r--r-- 1 cassandra cassandra 149344563 May 9 20:53 keyspace-event_index-tmp-jb-827472-Data.db -rw-r--r-- 11 cassandra cassandra 20149715779 May 15 04:18 keyspace-event_index-jb-829601-Data.db -rw-r--r-- 11 cassandra cassandra 7153875910 May 15 11:15 keyspace-event_index-jb-830446-Data.db -rw-r--r-- 11 cassandra cassandra 3051908121 May 16 03:08 keyspace-event_index-jb-831112-Data.db -rw-r--r-- 11 cassandra cassandra 6109582092 May 16 06:11 keyspace-event_index-jb-831709-Data.db -rw-r--r-- 11 cassandra cassandra 2922532233 May 16 07:14 keyspace-event_index-jb-831873-Data.db -rw-r--r-- 11 cassandra cassandra 1766025989 May 16 08:31 keyspace-event_index-jb-832111-Data.db -rw-r--r-- 8 cassandra cassandra 2922259593 May 16 11:39 keyspace-event_index-jb-832693-Data.db -rw-r--r-- 8 cassandra cassandra 1224495235 May 16 11:50 keyspace-event_index-jb-832764-Data.db -rw-r--r-- 7 cassandra cassandra 2051385733 May 16 12:57 keyspace-event_index-jb-832975-Data.db -rw-r--r-- 6 cassandra cassandra 853824939 May 16 13:12 keyspace-event_index-jb-833100-Data.db -rw-r--r-- 5 cassandra cassandra 763243638 May 16 14:58 keyspace-event_index-jb-833203-Data.db -rw-r--r-- 3 cassandra cassandra 99076639 May 16 16:29 keyspace-event_index-jb-833222-Data.db -rw-r--r-- 2 cassandra cassandra 254935385 May 16 17:21 keyspace-event_index-jb-833233-Data.db -rw-r--r-- 2 cassandra cassandra 66006223 May 16 17:51 keyspace-event_index-jb-833238-Data.db -rw-r--r-- 1 cassandra cassandra 50204322 May 16 18:18 keyspace-event_index-jb-833243-Data.db -rw-r--r-- 2 cassandra cassandra 16078537 May 16 18:26 keyspace-event_index-jb-833244-Data.db However, it looks to me that Cassandra knows that the first 115 files are old and are not really used to create snapshot. Here is the newly created snapshot. -bash-4.1$ ls -ltr snapshots/20160516-1800/*-Data.db -rw-r--r-- 11 cassandra cassandra 20149715779 May 15 04:18 snapshots/20160516-1800/keyspace-event_index-jb-829601-Data.db -rw-r--r-- 11 cassandra cassandra 7153875910 May 15 11:15 snapshots/20160516-1800/keyspace-event_index-jb-830446-Data.db -rw-r--r-- 11 cassandra cassandra 3051908121 May 16 03:08 snapshots/20160516-1800/keyspace-event_index-jb-831112-Data.db -rw-r--r-- 11 cassandra cassandra 6109582092 May 16 06:11 snapshots/20160516-1800/keyspace-event_index-jb-831709-Data.db -rw-r--r-- 11 cassandra cassandra 2922532233 May 16 07:14 snapshots/20160516-1800/keyspace-event_index-jb-831873-Data.db -rw-r--r-- 11 cassandra cassandra 1766025989 May 16 08:31 snapshots/20160516-1800/keyspace-event_index-jb-832111-Data.db -rw-r--r-- 8 cassandra cassandra 2922259593 May 16 11:39 snapshots/20160516-1800/keyspace-event_index-jb-832693-Data.db -rw-r--r-- 8 cassandra cassandra 1224495235 May 16 11:50 snapshots/20160516-1800/keyspace-event_index-jb-832764-Data.db -rw-r--r-- 7 cassandra cassandra 2051385733 May 16 12:57 snapshots/20160516-1800/keyspace-event_index-jb-832975-Data.db -rw-r--r-- 6 cassandra cassandra 853824939 May 16 13:12 snapshots/20160516-1800/keyspace-event_index-jb-833100-Data.db -rw-r--r-- 5 cassandra cassandra 763243638 May 16 14:58 snapshots/20160516-1800/keyspace-event_index-jb-833203-Data.db -rw-r--r-- 3 cassandra cassandra 99076639 May 16 16:29 snapshots/20160516-1800/keyspace-event_index-jb-833222-Data.db -rw-r--r-- 2 cassandra cassandra 254935385 May 16 17:21 snapshots/20160516-1800/keyspace-event_index-jb-833233-Data.db -rw-r--r-- 2 cassandra cassandra 66006223 May 16 17:51 snapshots/20160516-1800/keyspace-event_index-jb-833238-Data.db -rw-r--r-- 2 cassandra cassandra 16336415 May 16 17:59 snapshots/20160516-1800/keyspace-event_index-jb-833239-Data.db -rw-r--r-- 2 cassandra cassandra 1947026 May 16 18:00 snapshots/20160516-1800/keyspace-event_index-jb-833240-Data.db -bash-4.1$ You can see that only files dated "May 15 04:18" or later exist in the snapshot folder. My questions: 1. I believe Cassandra should have deleted all old 115 data files. What could have prevented those files being deleted? What can I do to make sure old files will be deleted in future compactions? 2. What are those files with "-tmp-"? What is the implication of their existance? Does it mean a compaction failed? 3. Since Cassandra knows what files are actually used, is there some utility that I can use to delete those old files? I can delete them manually, but that would be error-prone. 4. The table uses SizeTieredCompactionStrategy, and contains data with a TTL of 8 days. Will switching to DateTieredCompactionStrategy after upgrading to 2.1.11 offer much better compaction performance? Thanks,Dongfeng