Forgive me if that has been answered somewhere, but I could not find a concise 
or clear answer.

I am using Cassandra 2.0.6 on a 3 node cluster. I don't usually run manual 
compaction, and relied completely on Cassandra to automatically do it. A couple 
of days ago in preparation for an upgrade to Cassandra 2.1.11, I ran a manual, 
complete compaction. The compaction ran for many hours, but it did complete 
successfully, and the "load" in "nodetool status" dropped 80%. However, I did 
not see a big drop in disk usage, even after waiting for a couple of days. 
There are still many old data files left on the disk. For instance, here is a 
list of data files for one table.

-bash-4.1$ ls -ltr *-Data.db
-rw-r--r--  1 cassandra cassandra 36441245112 Jan 19 05:42 
keyspace-event_index-jb-620839-Data.db
-rw-r--r--  1 cassandra cassandra 48117578123 Jan 25 05:17 
keyspace-event_index-jb-649329-Data.db
-rw-r--r--  1 cassandra cassandra  8731574747 Jan 27 18:30 
keyspace-event_index-jb-662597-Data.db
-rw-r--r--  1 cassandra cassandra   835204478 Feb  2 07:20 
keyspace-event_index-jb-670851-Data.db
-rw-r--r--  1 cassandra cassandra    39496133 Feb  2 15:29 
keyspace-event_index-tmp-jb-672828-Data.db
... about 110 files listed here, removed for clarity ...

-rw-r--r--  1 cassandra cassandra   149344563 May  9 20:53 
keyspace-event_index-tmp-jb-827472-Data.db
-rw-r--r-- 11 cassandra cassandra 20149715779 May 15 04:18 
keyspace-event_index-jb-829601-Data.db
-rw-r--r-- 11 cassandra cassandra  7153875910 May 15 11:15 
keyspace-event_index-jb-830446-Data.db
-rw-r--r-- 11 cassandra cassandra  3051908121 May 16 03:08 
keyspace-event_index-jb-831112-Data.db
-rw-r--r-- 11 cassandra cassandra  6109582092 May 16 06:11 
keyspace-event_index-jb-831709-Data.db
-rw-r--r-- 11 cassandra cassandra  2922532233 May 16 07:14 
keyspace-event_index-jb-831873-Data.db
-rw-r--r-- 11 cassandra cassandra  1766025989 May 16 08:31 
keyspace-event_index-jb-832111-Data.db
-rw-r--r--  8 cassandra cassandra  2922259593 May 16 11:39 
keyspace-event_index-jb-832693-Data.db
-rw-r--r--  8 cassandra cassandra  1224495235 May 16 11:50 
keyspace-event_index-jb-832764-Data.db
-rw-r--r--  7 cassandra cassandra  2051385733 May 16 12:57 
keyspace-event_index-jb-832975-Data.db
-rw-r--r--  6 cassandra cassandra   853824939 May 16 13:12 
keyspace-event_index-jb-833100-Data.db
-rw-r--r--  5 cassandra cassandra   763243638 May 16 14:58 
keyspace-event_index-jb-833203-Data.db
-rw-r--r--  3 cassandra cassandra    99076639 May 16 16:29 
keyspace-event_index-jb-833222-Data.db
-rw-r--r--  2 cassandra cassandra   254935385 May 16 17:21 
keyspace-event_index-jb-833233-Data.db
-rw-r--r--  2 cassandra cassandra    66006223 May 16 17:51 
keyspace-event_index-jb-833238-Data.db
-rw-r--r--  1 cassandra cassandra    50204322 May 16 18:18 
keyspace-event_index-jb-833243-Data.db
-rw-r--r--  2 cassandra cassandra    16078537 May 16 18:26 
keyspace-event_index-jb-833244-Data.db

However, it looks to me that Cassandra knows that the first 115 files are old 
and are not really used to create snapshot. Here is the newly created snapshot.

-bash-4.1$ ls -ltr snapshots/20160516-1800/*-Data.db
-rw-r--r-- 11 cassandra cassandra 20149715779 May 15 04:18 
snapshots/20160516-1800/keyspace-event_index-jb-829601-Data.db
-rw-r--r-- 11 cassandra cassandra  7153875910 May 15 11:15 
snapshots/20160516-1800/keyspace-event_index-jb-830446-Data.db
-rw-r--r-- 11 cassandra cassandra  3051908121 May 16 03:08 
snapshots/20160516-1800/keyspace-event_index-jb-831112-Data.db
-rw-r--r-- 11 cassandra cassandra  6109582092 May 16 06:11 
snapshots/20160516-1800/keyspace-event_index-jb-831709-Data.db
-rw-r--r-- 11 cassandra cassandra  2922532233 May 16 07:14 
snapshots/20160516-1800/keyspace-event_index-jb-831873-Data.db
-rw-r--r-- 11 cassandra cassandra  1766025989 May 16 08:31 
snapshots/20160516-1800/keyspace-event_index-jb-832111-Data.db
-rw-r--r--  8 cassandra cassandra  2922259593 May 16 11:39 
snapshots/20160516-1800/keyspace-event_index-jb-832693-Data.db
-rw-r--r--  8 cassandra cassandra  1224495235 May 16 11:50 
snapshots/20160516-1800/keyspace-event_index-jb-832764-Data.db
-rw-r--r--  7 cassandra cassandra  2051385733 May 16 12:57 
snapshots/20160516-1800/keyspace-event_index-jb-832975-Data.db
-rw-r--r--  6 cassandra cassandra   853824939 May 16 13:12 
snapshots/20160516-1800/keyspace-event_index-jb-833100-Data.db
-rw-r--r--  5 cassandra cassandra   763243638 May 16 14:58 
snapshots/20160516-1800/keyspace-event_index-jb-833203-Data.db
-rw-r--r--  3 cassandra cassandra    99076639 May 16 16:29 
snapshots/20160516-1800/keyspace-event_index-jb-833222-Data.db
-rw-r--r--  2 cassandra cassandra   254935385 May 16 17:21 
snapshots/20160516-1800/keyspace-event_index-jb-833233-Data.db
-rw-r--r--  2 cassandra cassandra    66006223 May 16 17:51 
snapshots/20160516-1800/keyspace-event_index-jb-833238-Data.db
-rw-r--r--  2 cassandra cassandra    16336415 May 16 17:59 
snapshots/20160516-1800/keyspace-event_index-jb-833239-Data.db
-rw-r--r--  2 cassandra cassandra     1947026 May 16 18:00 
snapshots/20160516-1800/keyspace-event_index-jb-833240-Data.db
-bash-4.1$

You can see that only files dated "May 15 04:18" or later exist in the snapshot 
folder.

My questions:

1. I believe Cassandra should have deleted all old 115 data files. What could 
have prevented those files being deleted? What can I do to make sure old files 
will be deleted in future compactions?
2. What are those files with "-tmp-"? What is the implication of their 
existance? Does it mean a compaction failed?  
3. Since Cassandra knows what files are actually used, is there some utility 
that I can use to delete those old files? I can delete them manually, but that 
would be error-prone.
4. The table uses SizeTieredCompactionStrategy, and contains data with a TTL of 
8 days. Will switching to DateTieredCompactionStrategy after upgrading to 
2.1.11 offer much better compaction performance?
Thanks,Dongfeng

Reply via email to