Rob, Thank you! We are not using TTL, we're manually deleting data more than 5 days old for this CF. We're running 1.2.13 and are using size tiered compaction (this cf is append-only i.e.zero updates).
Sounds like we can get away with doing a (stop, delete old-data-file, restart) process on a rolling basis if I understand you. Thanks, Brian On Wed, Jun 18, 2014 at 2:37 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox <tar...@cabotresearch.com> > wrote: > >> I have a column family that only stores the last 5 days worth of some >> data...and yet I have files in the data directory for this CF that are 3 >> weeks old. >> > > Are you using TTL? If so : > > https://issues.apache.org/jira/browse/CASSANDRA-6654 > > Are you using size tiered or level compaction? > > I have six bunches of these file groups, each with a different nnnn >> value...and with timestamps of each of the last five days...plus one group >> from 3 weeks ago...which makes me wonder if that group somehow should have >> been deleted but were not. >> >> The files are tens or hundreds of gigs so deleting would be good, unless >> its really bad! >> > > Data files can't be deleted from the data dir with Cassandra running, but > it should be fine (if probably technically unsupported) to delete them with > Cassandra stopped. In most cases you don't want to do so, because you might > un-mask deleted rows or cause unexpected consistency characteristics. > > In your case, you know that no data in files created 3 weeks old can > possibly have any value, so it is safe to delete them. > > =Rob > >