I am on the very extreme end of ad serving. Cookies are very ephemeral. Some live a long time like 20 days, but the majority of our entries are valid only for a single day. If have a gc grace set to 10 days our data store is 10 larger then it "needs" to be. We clean up very aggressively all the time. We lower grace down to three days. On day X 12:00 midnight we might have a 20 GB column family. During the day we might add 10GB of new rows, overwrites/tombstones. If I let this go on for three days I might have 40-60 GB of data (the ways STS works out). So in my case, I run a major every night. For about 1hr the column family is doing the major, but then we are down to 20GB again! Yay.
This is a special case, and I really think there might be another storage model yet to be created that is better for me. On Fri, Jan 10, 2014 at 10:31 PM, Robert Wille <rwi...@fold3.com> wrote: > I essentially am using a time-based data model. But, if I don’t delete > obsolete data, my database will quickly become many times larger than > necessary. After a year, it would probably be 20x the size it would be if I > cleaned out obsolete data. > > Based on an analysis of my schema and access patterns, I’m pretty > confident that things will go well. I just wish that confidence were backed > up by experience rather than conjecture. > > Robert > > From: Anthony Grasso <anthony.gra...@gmail.com> > Reply-To: <user@cassandra.apache.org> > Date: Friday, January 10, 2014 at 7:12 PM > To: <user@cassandra.apache.org> > Subject: Re: Gotchas when creating a lot of tombstones > > Hi Robert, > > It sounds like you have done a fair bit investigating and testing already. > Have you considered using a time based data model to avoid doing deletions > in the database? > > Regards, > Anthony > > > On Thu, Jan 9, 2014 at 1:26 PM, sankalp kohli <kohlisank...@gmail.com>wrote: > >> With Level compaction, you will have some data which could not be >> reclaimed with gc grace=0 because it has not compacted yet. For this you >> might want to look at tombstone_threshold >> >> >> On Wed, Jan 8, 2014 at 10:31 AM, Tyler Hobbs <ty...@datastax.com> wrote: >> >>> >>> On Wed, Jan 1, 2014 at 7:53 AM, Robert Wille <rwi...@fold3.com> wrote: >>> >>>> >>>> Also, for this application, it would be quite reasonable to set gc >>>> grace seconds to 0 for these tables. Zombie data wouldn’t really be a >>>> problem. The background process that cleans up orphaned browse structures >>>> would simply re-delete any deleted data that reappeared. >>>> >>> >>> If you can set gc grace to 0, that will basically eliminate your >>> tombstone concerns entirely, so I would suggest that. >>> >>> >>> -- >>> Tyler Hobbs >>> DataStax <http://datastax.com/> >>> >> >> >