So is this at least a decent candidate for a feature request ticket?
On Tue, Feb 13, 2018 at 8:09 PM, Carl Mueller <carl.muel...@smartthings.com> wrote: > I'm particularly interested in getting the tombstones to "promote" up the > levels of LCS more quickly. Currently they get attached at the low level > and don't propagate up to higher levels until enough activity at a lower > level promotes the data. Meanwhile, LCS means compactions can occur in > parallel at each level. So row tombstones in their own sstable could be up > promoted the LCS levels preferentially before normal processes would move > them up. > > So if the delete-only sstables could move up more quickly, the compaction > at the levels would happen more quickly. > > The threshold stuff is nice if I read 7019 correctly, but what is the % > there? % of rows? % of columns? or % of the size of the sstable? Row > tombstones are pretty compact being just the rowkey and the tombstone > marker. So if 7019 is triggered at 10% of the sstable size, even a crapton > of tombstones deleting practially the entire database would only be a small > % size of the sstable. > > Since the row tombstones are so compact, that's why I think they are good > candidates for special handling. > > On Tue, Feb 13, 2018 at 5:22 PM, J. D. Jordan <jeremiah.jor...@gmail.com> > wrote: > >> Have you taken a look at the new stuff introduced by >> https://issues.apache.org/jira/browse/CASSANDRA-7019 ? I think it may >> go a ways to reducing the need for something complicated like this. >> Though it is an interesting idea as special handling for bulk deletes. >> If they were truly just sstables that only contained deletes the logic from >> 7109 would probably go a long ways. Though if you are bulk inserting >> deletes that is what you would end up with, so maybe it already works. >> >> -Jeremiah >> >> > On Feb 13, 2018, at 6:04 PM, Jeff Jirsa <jji...@gmail.com> wrote: >> > >> > On Tue, Feb 13, 2018 at 2:38 PM, Carl Mueller < >> carl.muel...@smartthings.com> >> > wrote: >> > >> >> In process of doing my second major data purge from a cassandra system. >> >> >> >> Almost all of my purging is done via row tombstones. While performing >> this >> >> the second time while trying to cajole compaction to occur (in 2.1.x, >> >> LevelledCompaction) to goddamn actually compact the data, I've been >> >> thinking as to why there isn't a separate set of sstable infrastructure >> >> setup for row deletion tombstones. >> >> >> >> I'm imagining that row tombstones are written to separate sstables than >> >> mainline data updates/appends and range/column tombstones. >> >> >> >> By writing them to separate sstables, the compaction systems can >> >> preferentially merge / process them when compacting sstables. >> >> >> >> This would create an additional sstable for lookup in the bloom >> filters, >> >> granted. I had visions of short circuiting the lookups to other >> sstables if >> >> a row tombstone was present in one of the special row tombstone >> sstables. >> >> >> >> >> > All of the above sounds really interesting to me, but I suspect it's a >> LOT >> > of work to make it happen correctly. >> > >> > You'd almost end up with 2 sets of logs for the LSM - a tombstone >> > log/generation, and a data log/generation, and the tombstone logs would >> be >> > read-only inputs to data compactions. >> > >> > >> >> But that would only be possible if there was the notion of a "super row >> >> tombstone" that permanently deleted a rowkey and all future writes >> would be >> >> invalidated. Kind of like how a tombstone with a mistakenly huge >> timestamp >> >> becomes a sneaky permanent tombstone, but intended. There could be a >> >> special operation / statement to undo this permanent tombstone, and >> since >> >> the row tombstones would be in their own dedicated sstables, they could >> >> process and compact more quickly, with prioritization by the compactor. >> >> >> >> >> > This part sounds way less interesting to me (other than the fact you can >> > already do this with a timestamp in the future, but it'll gc away at >> gcgs). >> > >> > >> >> I'm thinking there must be something I am forgetting in the >> >> read/write/compaction paths that invalidate this. >> >> >> > >> > There are a lot of places where we do "smart" things to make sure we >> don't >> > accidentally resurrect data. Read path includes old sstables for >> tombstones >> > for example. Those all need to be concretely identified and handled (and >> > tested),. >> > >