Hi, I'm looking for suggestions/caveats on using TTL as a subsitute for a manual data purge job.
We have few tables that hold user information - this could be guest or registered users, and there could be between 500K to 1M records created per day per table. Currently, these tables have a secondary indexed updated_date column which is populated on each update. However, we have been getting timeouts when running queries using updated_date when the number of records are high, so i don't think this would be a reliable option in the long term when we need to purge records that have not been used for the last X days. In this scenario, is it advisable to include a high enough TTL (i.e the amount of time we want these to last, could be 3 to 6 months) when inserting/updating records? There could be cases where the TTL may get reset after couple of days/weeks, when the user visits the site again. The tables have fixed number of columns, except for one which has a clustering key, and may have max 10 entries per partition key. I need to know the overhead of having so many rows with TTL hanging around for a relatively longer duration (weeks/months), and the impacts it could have on performance/storage. If this is not a recommended approach, what would be an alternate design which could be used for a manual purge job, without using secondary indices. We are using Cassandra 2.0.x. Thanks, Joseph