I¹ve seen a number of people on this forum that have run into a variety of
issues when creating a lot of tombstones. This makes me concerned about an
application that I¹m writing. It works great on my test server, but I¹m
concerned about what will happen when I move my production server to it. I¹m
wondering if you good people can help me know if I¹m likely to run into
problems or not.

Here are the basics of my application. I¹m using Cassandra to store the
browse structure for a document store. My documents are grouped into titles.
When a title structurally changes (e.g. documents are added or removed), I
do not modify the browse structure in place, but rather build a new browse
structure for that title. I have a process which nightly goes through my
tables and deletes orphaned browse structures. That¹s the part that concerns
me. I can generate a lot of tombstones in a pretty short period of time.

When I move my production database to Cassandra, I expect to have somewhere
between .5 billion and 1 billion total records in five tables. On a typical
day, I might replace 5% of those records, which means I¹ll generally create
a few tens of millions of tombstones per day. However, I have one title that
makes up about half my database. I might update that title once every month
or two. When I do that, my database will grow by about 50%, and then about
33% of it will become tombstones (several hundred million tombstones).

I¹ve analyzed my queries, and I don¹t believe that any single query will
encounter more than a few hundred tombstones (except for the cleanup
process). My primary concerns lie in what will happen when repair and
compaction hit all those tombstones.

If it would help make life easier for Cassandra, it would be quite
reasonable to throttle the rate at which I create tombstones, as long as
that rate is sufficiently high to cleanup my big title in a couple of weeks.
I could limit the rate  either by quitting the cleanup after some threshold
has been reached, or by using a rate limiter. I don¹t know if spreading out
the tombstones over time is helpful or not (or even if this is likely to be
a problem).

Also, for this application, it would be quite reasonable to set gc grace
seconds to 0 for these tables. Zombie data wouldn¹t really be a problem. The
background process that cleans up orphaned browse structures would simply
re-delete any deleted data that reappeared.

I would be very grateful if someone who has experience with an application
that creates a lot of tombstones could help me understand how Cassandra is
likely to behave under this kind of a scenario.

Thanks

Robert



Reply via email to