Jonathan pointed out in another thread that it looks like I'm running into CASSANDRA-2059, where secondary files are not being properly deleted. My production data set at any given time is less than 100 MB in size, but the Cassandra data directories on each instance are using 30 to 40 times as much space right now, and steadily growing.
I understand I can remove the root cause of the problem by applying the patch that's attached to the bug report or by upgrading to 0.7.1 when it's out. In the meantime, is it safe to manually delete stale files while Cassandra is running? And how do I determine when a set of files is stale? I'd assume that a given set of files is deletable if there is no -Data.db file and the -Compacted file has zero length. Example of what I would think is a set of stale files, without a -Data,db file: ls -l *3090* -rw-rw-r-- 1 user group 0 Feb 3 10:00 Payload-e-3090-Compacted -rw-rw-r-- 1 user group 245 Feb 3 10:00 Payload-e-3090-Filter.db -rw-rw-r-- 1 user group 4362 Feb 3 10:00 Payload-e-3090-Index.db -rw-rw-r-- 1 user group 4840 Feb 3 10:00 Payload-e-3090-Statistics.db I've got these all the way back to Payload-e-1-Index.db. Non-stale files: ls -l *3095* -rw-rw-r-- 1 user group 0 Feb 3 10:35 Payload-e-3095-Compacted -rw-rw-r-- 1 user group 41269735 Feb 3 10:14 Payload-e-3095-Data.db -rw-rw-r-- 1 user group 286405 Feb 3 10:14 Payload-e-3095-Filter.db -rw-rw-r-- 1 user group 7608022 Feb 3 10:14 Payload-e-3095-Index.db -rw-rw-r-- 1 user group 4840 Feb 3 10:14 Payload-e-3095-Statistics.db There is an active Data.db file, so I'd leave this group alone. --Omer