Re: Customized Compaction Strategy: Dev Questions

Redmumba Wed, 04 Jun 2014 10:46:37 -0700

Sorry, yes, that is what I was looking to do--i.e., create a
"TopologicalCompactionStrategy" or similar.



On Wed, Jun 4, 2014 at 10:40 AM, Russell Bradberry <rbradbe...@gmail.com>
wrote:

> Maybe I’m misunderstanding something, but what makes you think that
> running a major compaction every day will cause they data from January 1st
> to exist in only one SSTable and not have data from other days in the
> SSTable as well? Are you talking about making a new compaction strategy
> that creates SSTables by day?
>
>
>
> On June 4, 2014 at 1:36:10 PM, Redmumba (redmu...@gmail.com) wrote:
>
>  Let's say I run a major compaction every day, so that the "oldest"
> sstable contains only the data for January 1st.  Assuming all the nodes are
> in-sync and have had at least one repair run before the table is dropped
> (so that all information for that time period is "the same"), wouldn't it
> be safe to assume that the same data would be dropped on all nodes?  There
> might be a period when the compaction is running where different nodes
> might have an inconsistent view of just that days' data (in that some would
> have it and others would not), but the cluster would still function and
> become eventually consistent, correct?
>
> Also, if the entirety of the sstable is being dropped, wouldn't the
> tombstones be removed with it?  I wouldn't be concerned with individual
> rows and columns, and this is a write-only table, more or less--the only
> deletes that occur in the current system are to delete the old data.
>
>
> On Wed, Jun 4, 2014 at 10:24 AM, Russell Bradberry <rbradbe...@gmail.com>
> wrote:
>
>>  I’m not sure what you want to do is feasible.  At a high level I can
>> see you running into issues with RF etc.  The SSTables node to node are not
>> identical, so if you drop a full SSTable on one node there is no one
>> corresponding SSTable on the adjacent nodes to drop.    You would need to
>> choose data to compact out, and ensure it is removed on all replicas as
>> well.  But if your problem is that you’re low on disk space then you
>> probably won’t be able to write out a new SSTable with the older
>> information compacted out. Also, there is more to an SSTable than just
>> data, the SSTable could have tombstones and other relics that haven’t been
>> cleaned up from nodes coming or going.
>>
>>
>>
>>
>> On June 4, 2014 at 1:10:58 PM, Redmumba (redmu...@gmail.com) wrote:
>>
>>   Thanks, Russell--yes, a similar concept, just applied to sstables.
>> I'm assuming this would require changes to both major compactions, and
>> probably GC (to remove the old tables), but since I'm not super-familiar
>> with the C* internals, I wanted to make sure it was feasible with the
>> current toolset before I actually dived in and started tinkering.
>>
>> Andrew
>>
>>
>> On Wed, Jun 4, 2014 at 10:04 AM, Russell Bradberry <rbradbe...@gmail.com>
>> wrote:
>>
>>>  hmm, I see. So something similar to Capped Collections in MongoDB.
>>>
>>>
>>>
>>> On June 4, 2014 at 1:03:46 PM, Redmumba (redmu...@gmail.com) wrote:
>>>
>>>   Not quite; if I'm at say 90% disk usage, I'd like to drop the oldest
>>> sstable rather than simply run out of space.
>>>
>>> The problem with using TTLs is that I have to try and guess how much
>>> data is being put in--since this is auditing data, the usage can vary
>>> wildly depending on time of year, verbosity of auditing, etc..  I'd like to
>>> maximize the disk space--not optimize the cleanup process.
>>>
>>> Andrew
>>>
>>>
>>> On Wed, Jun 4, 2014 at 9:47 AM, Russell Bradberry <rbradbe...@gmail.com>
>>> wrote:
>>>
>>>>  You mean this:
>>>>
>>>>  https://issues.apache.org/jira/browse/CASSANDRA-5228
>>>>
>>>>  ?
>>>>
>>>>
>>>>
>>>> On June 4, 2014 at 12:42:33 PM, Redmumba (redmu...@gmail.com) wrote:
>>>>
>>>>   Good morning!
>>>>
>>>> I've asked (and seen other people ask) about the ability to drop old
>>>> sstables, basically creating a FIFO-like clean-up process.  Since we're
>>>> using Cassandra as an auditing system, this is particularly appealing to us
>>>> because it means we can maximize the amount of auditing data we can keep
>>>> while still allowing Cassandra to clear old data automatically.
>>>>
>>>> My idea is this: perform compaction based on the range of dates
>>>> available in the sstable (or just metadata about when it was created).  For
>>>> example, a major compaction could create a combined sstable per day--so
>>>> that, say, 60 days of data after a major compaction would contain 60
>>>> sstables.
>>>>
>>>> My question then is, will this be possible by simply implementing a
>>>> separate AbstractCompactionStrategy?  Does this sound feasilble at all?
>>>> Based on the implementation of Size and Leveled strategies, it looks like I
>>>> would have the ability to control what and how things get compacted, but I
>>>> wanted to verify before putting time into it.
>>>>
>>>> Thank you so much for your time!
>>>>
>>>> Andrew
>>>>
>>>>
>>>
>>
>

Re: Customized Compaction Strategy: Dev Questions

Reply via email to