Re: Column Family migration/tombstones

aaron morton Sun, 06 Jan 2013 20:10:11 -0800

> Are there other performance considerations  that I need to keep in mind?
Thats about it.


Sylvain has written a script or some such to reverse compaction. It was 
mentioned sometime in the last month I think. Sylvain ? 

> after we are complete the migration should be fairly small (about 500,000 
> skinny rows per node, including replicas).
If we are talking about say 500MB of data I would go with a major compaction 
run gc_grace_seconds after the deletion. You may want to temporarily reduce 
gc_grace_seconds on the CF so you don't have to wait around for 10 days :)

Once you have purged the deleted data, keep an eye on the sstables. If it looks 
like the one created by major compaction is not going to get compaction for a 
long time come back and ask about  "anti compacting" / "sstable split". 

Hope that helps.

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/01/2013, at 11:31 AM, Mike <mthero...@yahoo.com> wrote:

> Thanks Aaron, I appreciate it.
> 
> It is my understanding, major compactions are not recommended because it will 
> essentially create one massive SSTable that will not compact with any new 
> SSTables for some time.  I can see how this might be a performance concern in 
> the general case, because any read operation would always require multiple 
> disk reads across multiple SSTables.  In addition, information in the new 
> table will not be purged due to subsequent tombstones until that table can be 
> compacted.  This might then require regular major compactions to be able to 
> clear that data.  Are there other performance considerations  that I need to 
> keep in mind?
> 
> However, this might not be as  much of an issue in our usecase.
> 
> It just so happens, the data in this column family is changed very 
> infrequently, except for deletes (as of recently, and will now occur over 
> time).  In these case, I don't believe having data spread across the SSTables 
> will be an issue, as either the data will have a tombstone (which causes 
> cassandra to stop looking at other SSTables), or that data will be in one 
> SSTable.  So I do not believe I/O will end up being an issue here.
> 
> What may be an issue is cleaning out old data in the SSTable that will exist 
> after a major compaction.  However, this might not require major compactions 
> to happen nearly as frequently as I've seen recommended (once every gc_grace 
> period), or at all.   With the new design, data will be deleted from this 
> table after a number of days.  Deletes again the remaining data after a major 
> compaction might not get processed until the next major compaction, but any 
> deletes against new data should be deleted normally through minor 
> compactions.  In addition, the remaining data after we are complete the 
> migration should be fairly small (about 500,000 skinny rows per node, 
> including replicas).
> 
> Any other thoughts on this?
> -Mike
> 
> 
> On 1/6/2013 3:49 PM, aaron morton wrote:
>>> When these rows are deleted, tombstones will be created and stored in more 
>>> recent sstables.  Upon compaction of sstables, and after gc_grace_period, I 
>>> presume cassandra will have removed all traces of that row from disk.
>> Yes.
>> When using Size Tiered compaction (the default) tombstones are purged when 
>> all fragments of a row are included in a compaction. So if you have rows 
>> which are written to for A Very Long Time(™) it can take a while for 
>> everything to get purged.
>> 
>> In the normal case though it's not a concern.
>> 
>>> However, after deleting such a large amount of information, there is no 
>>> guarantee that Cassandra will compact these two tables together, causing 
>>> the data to be deleted (right?).  Therefore, even after gc_grace_period, a 
>>> large amount of space may still be used.
>> In the normal case this is not really an issue.
>> 
>> In your case things sound a little non normal. If you will have only a few 
>> hundred MB's, or a few GB's, of data level in the CF I would consider 
>> running a major compaction on it.
>> 
>> Major compaction will work on all SSTables and create one big SSTable, this 
>> will ensure all deleted data is deleted. We normally caution agains this as 
>> the one new file is often very big and will not get compacted for a while. 
>> However if you are deleting lots-o-data it may work. (There is also an anti 
>> compaction script around that may be of use.)
>> 
>> Another alternative is to compact some of the older sstables with newer ones 
>> via User Defined Compaction with JMX.
>> 
>> 
>>> Is there a way, other than a major compaction, to clean up all this old 
>>> data?  I assume a nodetool scrub will cleanup old tombstones only if that 
>>> row is not in another sstable?
>> I don't think scrub (or upgradesstables) remove tombstones.
>> 
>>> Do tombstones take up bloomfilter space after gc_grace_period?
>> Any row, regardless of the liveness of the columns, takes up bloom filter 
>> space (in -Filter.db).
>> Once the row is removed it will no longer take up space.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 6/01/2013, at 6:44 AM, Mike <mthero...@yahoo.com> wrote:
>> 
>>> A couple more questions.
>>> 
>>> When these rows are deleted, tombstones will be created and stored in more 
>>> recent sstables.  Upon compaction of sstables, and after gc_grace_period, I 
>>> presume cassandra will have removed all traces of that row from disk.
>>> 
>>> However, after deleting such a large amount of information, there is no 
>>> guarantee that Cassandra will compact these two tables together, causing 
>>> the data to be deleted (right?).  Therefore, even after gc_grace_period, a 
>>> large amount of space may still be used.
>>> 
>>> Is there a way, other than a major compaction, to clean up all this old 
>>> data?  I assume a nodetool scrub will cleanup old tombstones only if that 
>>> row is not in another sstable?
>>> 
>>> Do tombstones take up bloomfilter space after gc_grace_period?
>>> 
>>> -Mike
>>> 
>>> On 1/2/2013 6:41 PM, aaron morton wrote:
>>>>> 1) As one can imagine, the index and bloom filter for this column family 
>>>>> is large.  Am I correct to assume that bloom filter and index space will 
>>>>> not be reduced until after gc_grace_period?
>>>> Yes.
>>>> 
>>>>> 2) If I would manually run repair across a cluster, is there a process I 
>>>>> can use to safely remove these tombstones before gc_grace period to free 
>>>>> this memory sooner?
>>>> There is nothing to specifically purge tombstones.
>>>> 
>>>> You can temporarily reduce the gc_grace_seconds and then trigger 
>>>> compaction. Either by reducing the min_compaction_threshold to 2 and doing 
>>>> a flush. Or by kicking of a user defined compaction using the JMX 
>>>> interface.
>>>> 
>>>>> 3) Any words of warning when undergoing this?
>>>> Make sure you have a good breakfast.
>>>> (It's more general advice than Cassandra specific.)
>>>> 
>>>> 
>>>> Cheers
>>>> 
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> New Zealand
>>>> 
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 30/12/2012, at 8:51 AM, Mike <mthero...@yahoo.com> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> We are undergoing a change to our internal datamodel that will result in 
>>>>> the eventual deletion of over a hundred million rows from a Cassandra 
>>>>> column family.  From what I understand, this will result in the 
>>>>> generation of tombstones, which will be cleaned up during compaction, 
>>>>> after gc_grace_period time (default: 10 days).
>>>>> 
>>>>> A couple of questions:
>>>>> 
>>>>> 1) As one can imagine, the index and bloom filter for this column family 
>>>>> is large.  Am I correct to assume that bloom filter and index space will 
>>>>> not be reduced until after gc_grace_period?
>>>>> 
>>>>> 2) If I would manually run repair across a cluster, is there a process I 
>>>>> can use to safely remove these tombstones before gc_grace period to free 
>>>>> this memory sooner?
>>>>> 
>>>>> 3) Any words of warning when undergoing this?
>>>>> 
>>>>> We are running Cassandra 1.1.2 on a 6 node cluster and a Replication 
>>>>> Factor of 3.  We use LOCAL_QUORM consistency for all operations.
>>>>> 
>>>>> Thanks!
>>>>> -Mike
>

Re: Column Family migration/tombstones

Reply via email to