I¹ve seen a number of people on this forum that have run into a variety of issues when creating a lot of tombstones. This makes me concerned about an application that I¹m writing. It works great on my test server, but I¹m concerned about what will happen when I move my production server to it. I¹m wondering if you good people can help me know if I¹m likely to run into problems or not.
Here are the basics of my application. I¹m using Cassandra to store the browse structure for a document store. My documents are grouped into titles. When a title structurally changes (e.g. documents are added or removed), I do not modify the browse structure in place, but rather build a new browse structure for that title. I have a process which nightly goes through my tables and deletes orphaned browse structures. That¹s the part that concerns me. I can generate a lot of tombstones in a pretty short period of time. When I move my production database to Cassandra, I expect to have somewhere between .5 billion and 1 billion total records in five tables. On a typical day, I might replace 5% of those records, which means I¹ll generally create a few tens of millions of tombstones per day. However, I have one title that makes up about half my database. I might update that title once every month or two. When I do that, my database will grow by about 50%, and then about 33% of it will become tombstones (several hundred million tombstones). I¹ve analyzed my queries, and I don¹t believe that any single query will encounter more than a few hundred tombstones (except for the cleanup process). My primary concerns lie in what will happen when repair and compaction hit all those tombstones. If it would help make life easier for Cassandra, it would be quite reasonable to throttle the rate at which I create tombstones, as long as that rate is sufficiently high to cleanup my big title in a couple of weeks. I could limit the rate either by quitting the cleanup after some threshold has been reached, or by using a rate limiter. I don¹t know if spreading out the tombstones over time is helpful or not (or even if this is likely to be a problem). Also, for this application, it would be quite reasonable to set gc grace seconds to 0 for these tables. Zombie data wouldn¹t really be a problem. The background process that cleans up orphaned browse structures would simply re-delete any deleted data that reappeared. I would be very grateful if someone who has experience with an application that creates a lot of tombstones could help me understand how Cassandra is likely to behave under this kind of a scenario. Thanks Robert