Sorry, yes, that is what I was looking to do--i.e., create a "TopologicalCompactionStrategy" or similar.
On Wed, Jun 4, 2014 at 10:40 AM, Russell Bradberry <rbradbe...@gmail.com> wrote: > Maybe I’m misunderstanding something, but what makes you think that > running a major compaction every day will cause they data from January 1st > to exist in only one SSTable and not have data from other days in the > SSTable as well? Are you talking about making a new compaction strategy > that creates SSTables by day? > > > > On June 4, 2014 at 1:36:10 PM, Redmumba (redmu...@gmail.com) wrote: > > Let's say I run a major compaction every day, so that the "oldest" > sstable contains only the data for January 1st. Assuming all the nodes are > in-sync and have had at least one repair run before the table is dropped > (so that all information for that time period is "the same"), wouldn't it > be safe to assume that the same data would be dropped on all nodes? There > might be a period when the compaction is running where different nodes > might have an inconsistent view of just that days' data (in that some would > have it and others would not), but the cluster would still function and > become eventually consistent, correct? > > Also, if the entirety of the sstable is being dropped, wouldn't the > tombstones be removed with it? I wouldn't be concerned with individual > rows and columns, and this is a write-only table, more or less--the only > deletes that occur in the current system are to delete the old data. > > > On Wed, Jun 4, 2014 at 10:24 AM, Russell Bradberry <rbradbe...@gmail.com> > wrote: > >> I’m not sure what you want to do is feasible. At a high level I can >> see you running into issues with RF etc. The SSTables node to node are not >> identical, so if you drop a full SSTable on one node there is no one >> corresponding SSTable on the adjacent nodes to drop. You would need to >> choose data to compact out, and ensure it is removed on all replicas as >> well. But if your problem is that you’re low on disk space then you >> probably won’t be able to write out a new SSTable with the older >> information compacted out. Also, there is more to an SSTable than just >> data, the SSTable could have tombstones and other relics that haven’t been >> cleaned up from nodes coming or going. >> >> >> >> >> On June 4, 2014 at 1:10:58 PM, Redmumba (redmu...@gmail.com) wrote: >> >> Thanks, Russell--yes, a similar concept, just applied to sstables. >> I'm assuming this would require changes to both major compactions, and >> probably GC (to remove the old tables), but since I'm not super-familiar >> with the C* internals, I wanted to make sure it was feasible with the >> current toolset before I actually dived in and started tinkering. >> >> Andrew >> >> >> On Wed, Jun 4, 2014 at 10:04 AM, Russell Bradberry <rbradbe...@gmail.com> >> wrote: >> >>> hmm, I see. So something similar to Capped Collections in MongoDB. >>> >>> >>> >>> On June 4, 2014 at 1:03:46 PM, Redmumba (redmu...@gmail.com) wrote: >>> >>> Not quite; if I'm at say 90% disk usage, I'd like to drop the oldest >>> sstable rather than simply run out of space. >>> >>> The problem with using TTLs is that I have to try and guess how much >>> data is being put in--since this is auditing data, the usage can vary >>> wildly depending on time of year, verbosity of auditing, etc.. I'd like to >>> maximize the disk space--not optimize the cleanup process. >>> >>> Andrew >>> >>> >>> On Wed, Jun 4, 2014 at 9:47 AM, Russell Bradberry <rbradbe...@gmail.com> >>> wrote: >>> >>>> You mean this: >>>> >>>> https://issues.apache.org/jira/browse/CASSANDRA-5228 >>>> >>>> ? >>>> >>>> >>>> >>>> On June 4, 2014 at 12:42:33 PM, Redmumba (redmu...@gmail.com) wrote: >>>> >>>> Good morning! >>>> >>>> I've asked (and seen other people ask) about the ability to drop old >>>> sstables, basically creating a FIFO-like clean-up process. Since we're >>>> using Cassandra as an auditing system, this is particularly appealing to us >>>> because it means we can maximize the amount of auditing data we can keep >>>> while still allowing Cassandra to clear old data automatically. >>>> >>>> My idea is this: perform compaction based on the range of dates >>>> available in the sstable (or just metadata about when it was created). For >>>> example, a major compaction could create a combined sstable per day--so >>>> that, say, 60 days of data after a major compaction would contain 60 >>>> sstables. >>>> >>>> My question then is, will this be possible by simply implementing a >>>> separate AbstractCompactionStrategy? Does this sound feasilble at all? >>>> Based on the implementation of Size and Leveled strategies, it looks like I >>>> would have the ability to control what and how things get compacted, but I >>>> wanted to verify before putting time into it. >>>> >>>> Thank you so much for your time! >>>> >>>> Andrew >>>> >>>> >>> >> >