That still involves quite a bit of infrastructure work--it also means that to query the data, I would have to make N queries, one per table, to query for audit information (audit information is sorted by a key identifying the item, and then the date). I don't think this would yield any benefit (to me) over simply tombstoning the values or creating a secondary index on date and simply doing a DELETE, right?
Is there something internally preventing me from implementing this as a separate Strategy? On Wed, Jun 4, 2014 at 10:47 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > I'd suggest creating 1 table per day, and dropping the tables you don't > need once you're done. > > > On Wed, Jun 4, 2014 at 10:44 AM, Redmumba <redmu...@gmail.com> wrote: > >> Sorry, yes, that is what I was looking to do--i.e., create a >> "TopologicalCompactionStrategy" or similar. >> >> >> On Wed, Jun 4, 2014 at 10:40 AM, Russell Bradberry <rbradbe...@gmail.com> >> wrote: >> >>> Maybe I’m misunderstanding something, but what makes you think that >>> running a major compaction every day will cause they data from January 1st >>> to exist in only one SSTable and not have data from other days in the >>> SSTable as well? Are you talking about making a new compaction strategy >>> that creates SSTables by day? >>> >>> >>> >>> On June 4, 2014 at 1:36:10 PM, Redmumba (redmu...@gmail.com) wrote: >>> >>> Let's say I run a major compaction every day, so that the "oldest" >>> sstable contains only the data for January 1st. Assuming all the nodes are >>> in-sync and have had at least one repair run before the table is dropped >>> (so that all information for that time period is "the same"), wouldn't it >>> be safe to assume that the same data would be dropped on all nodes? There >>> might be a period when the compaction is running where different nodes >>> might have an inconsistent view of just that days' data (in that some would >>> have it and others would not), but the cluster would still function and >>> become eventually consistent, correct? >>> >>> Also, if the entirety of the sstable is being dropped, wouldn't the >>> tombstones be removed with it? I wouldn't be concerned with individual >>> rows and columns, and this is a write-only table, more or less--the only >>> deletes that occur in the current system are to delete the old data. >>> >>> >>> On Wed, Jun 4, 2014 at 10:24 AM, Russell Bradberry <rbradbe...@gmail.com >>> > wrote: >>> >>>> I’m not sure what you want to do is feasible. At a high level I can >>>> see you running into issues with RF etc. The SSTables node to node are not >>>> identical, so if you drop a full SSTable on one node there is no one >>>> corresponding SSTable on the adjacent nodes to drop. You would need to >>>> choose data to compact out, and ensure it is removed on all replicas as >>>> well. But if your problem is that you’re low on disk space then you >>>> probably won’t be able to write out a new SSTable with the older >>>> information compacted out. Also, there is more to an SSTable than just >>>> data, the SSTable could have tombstones and other relics that haven’t been >>>> cleaned up from nodes coming or going. >>>> >>>> >>>> >>>> >>>> On June 4, 2014 at 1:10:58 PM, Redmumba (redmu...@gmail.com) wrote: >>>> >>>> Thanks, Russell--yes, a similar concept, just applied to sstables. >>>> I'm assuming this would require changes to both major compactions, and >>>> probably GC (to remove the old tables), but since I'm not super-familiar >>>> with the C* internals, I wanted to make sure it was feasible with the >>>> current toolset before I actually dived in and started tinkering. >>>> >>>> Andrew >>>> >>>> >>>> On Wed, Jun 4, 2014 at 10:04 AM, Russell Bradberry < >>>> rbradbe...@gmail.com> wrote: >>>> >>>>> hmm, I see. So something similar to Capped Collections in MongoDB. >>>>> >>>>> >>>>> >>>>> On June 4, 2014 at 1:03:46 PM, Redmumba (redmu...@gmail.com) wrote: >>>>> >>>>> Not quite; if I'm at say 90% disk usage, I'd like to drop the >>>>> oldest sstable rather than simply run out of space. >>>>> >>>>> The problem with using TTLs is that I have to try and guess how much >>>>> data is being put in--since this is auditing data, the usage can vary >>>>> wildly depending on time of year, verbosity of auditing, etc.. I'd like >>>>> to >>>>> maximize the disk space--not optimize the cleanup process. >>>>> >>>>> Andrew >>>>> >>>>> >>>>> On Wed, Jun 4, 2014 at 9:47 AM, Russell Bradberry < >>>>> rbradbe...@gmail.com> wrote: >>>>> >>>>>> You mean this: >>>>>> >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-5228 >>>>>> >>>>>> ? >>>>>> >>>>>> >>>>>> >>>>>> On June 4, 2014 at 12:42:33 PM, Redmumba (redmu...@gmail.com) wrote: >>>>>> >>>>>> Good morning! >>>>>> >>>>>> I've asked (and seen other people ask) about the ability to drop old >>>>>> sstables, basically creating a FIFO-like clean-up process. Since we're >>>>>> using Cassandra as an auditing system, this is particularly appealing to >>>>>> us >>>>>> because it means we can maximize the amount of auditing data we can keep >>>>>> while still allowing Cassandra to clear old data automatically. >>>>>> >>>>>> My idea is this: perform compaction based on the range of dates >>>>>> available in the sstable (or just metadata about when it was created). >>>>>> For >>>>>> example, a major compaction could create a combined sstable per day--so >>>>>> that, say, 60 days of data after a major compaction would contain 60 >>>>>> sstables. >>>>>> >>>>>> My question then is, will this be possible by simply implementing a >>>>>> separate AbstractCompactionStrategy? Does this sound feasilble at all? >>>>>> Based on the implementation of Size and Leveled strategies, it looks >>>>>> like I >>>>>> would have the ability to control what and how things get compacted, but >>>>>> I >>>>>> wanted to verify before putting time into it. >>>>>> >>>>>> Thank you so much for your time! >>>>>> >>>>>> Andrew >>>>>> >>>>>> >>>>> >>>> >>> >> > > > -- > Jon Haddad > http://www.rustyrazorblade.com > skype: rustyrazorblade >