> periodically trimming the row by by deleting the oldest columns, the deleted > columns won't get cleaned up until all fragments of the row exist in a single > sstable and that sstable undergoes a compaction? Nope. They are purged when all of the fragments of the row exist in the same SSTabels (plural) being compacted.
Say you create a row and write to it for a while, it may be spread into 2 or 3 new stables. When there are 4 they are compacted into one, which will be bigger than the original 4. When there are 4 at the next size bucket they are compacted and so on. If you row exists in one size bucket only it GC will be purged. If you have a row you have been writing to for a long time it may be spread out in many buckets. That's not normally a big problem, but if you also do lots of deletes the tombstones will not get purged. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 14/12/2012, at 4:45 PM, Mike Smith <m...@mailchannels.com> wrote: > Thanks for the great explanation. > > I'd just like some clarification on the last point. Is it the case that if I > constantly add new columns to a row, while periodically trimming the row by > by deleting the oldest columns, the deleted columns won't get cleaned up > until all fragments of the row exist in a single sstable and that sstable > undergoes a compaction? > > If my understanding is correct, do you know if 1.2 will enable cleanup of > columns in rows that have scattered fragments? Or, should I take a different > approach? > > > > On Thu, Dec 13, 2012 at 5:52 PM, aaron morton <aa...@thelastpickle.com> wrote: >> Is it possible to use scrub to accelerate the clean up of expired/deleted >> data? > No. > Scrub, and upgradesstables, are used to re-write each file on disk. Scrub may > remove some rows from a file because of corruption, however upgradesstables > will not. > > If you have long lived rows and a mixed work load of writes and deletes there > are a couple of options. > > You can try levelled compaction > http://www.datastax.com/dev/blog/when-to-use-leveled-compaction > > You can tune the default sized tiered compaction by increasing the > min_compaction_threshold. This will increase the number of files that must > exist in each size tier before it will be compacted. As a result the speed at > which rows move into the higher tiers will slow down. > > Note that having lots of files may have a negative impact on read > performance. You can measure this my looking at the SSTables per read metric > in the cfhistograms. > > Lastly you can run a user defined or major compaction. User defined > compaction is available via JMX and allows you to compact any file you want. > Manual / major compaction is available via node tool. We usually discourage > it's use as it will create one big file that will not get compacted for a > while. > > > For background the tombstones / expired columns for a row are only purged > from the database when all fragments of the row are in the files been > compacted. So if you have an old row that is spread out over many files it > may not get purged. > > Hope that helps. > > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 14/12/2012, at 3:01 AM, Mike Smith <m...@mailchannels.com> wrote: > >> I'm using 1.0.12 and I find that large sstables tend to get compacted >> infrequently. I've got data that gets deleted or expired frequently. Is it >> possible to use scrub to accelerate the clean up of expired/deleted data? >> >> -- >> Mike Smith >> Director Development, MailChannels >> > > > > > -- > Mike Smith > Director Development, MailChannels >