In a way, yes. A tombstone will only be removed after gc_grace iff the compaction is sure that it contains all rows which that tombstone might shadow. When two non-tombstone conflicting rows are compacted, it's always just LWW.
On Wed, Apr 29, 2015 at 2:42 PM, Eric Stevens <migh...@gmail.com> wrote: > But we're talking about a single tombstone on each of a finite (small) set > of values, right? We're not talking about INSERTs which are 99% nulls (at > least I don't think that's what Matthew was suggesting). Unless you're > engaging in the antipattern of repeated overwrite, I'm still struggling to > see why this is worse than an equivalent number of non-tombstoned writes. > In fact from the description I don't think we're talking about these > tombstones even occluding any value at all. > > > imagine a multi tb sstable w/ 99% tombstones > > Let's play with this hypothetical, which doesn't seem like a probable > consequence of the original question. You'd have to have taken enough > writes *inside* gc grace period to have even produced a multi-TB sstable > to come anywhere near this, and even then this either exceeds or comes > really close to the recommended maximum total data size per node (let alone > in a single sstable). If you did have such an sstable, it doesn't seem > very likely to compact again inside gc grace period short of manually > triggered major compaction. > > But let's assume you do that, you run cassandra stress inserting nothing > but tombstones, and kick off major compaction periodically. If it > compacted inside gc grace period, is this worse for compaction than the > same number of non-tombstoned values (i.e. a multi-TB sstable is costly to > compact no matter what the contents)? If it compacted outside gc grace > period, then 99% of the work is just dropping tombstones, it seems like it > would run really fast (for being an absurdly large sstable), as there would > be just 1% of the contents to actually copy over to the new sstable. > > I'm still not clear on what I'm missing. Is a tombstone more expensive to > compact than a non-tombstone? > > On Wed, Apr 29, 2015 at 10:06 AM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> Enough tombstones can inflate the size of an SSTable causing issues >> during compaction (imagine a multi tb sstable w/ 99% tombstones) even if >> there's no clustering key defined. >> >> Perhaps an edge case, but worth considering. >> >> On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens <migh...@gmail.com> wrote: >> >>> Correct me if I'm wrong, but tombstones are only really problematic if >>> you have them going into clustering keys, then perform a range select on >>> that column, right (assuming it's not a symptom of the antipattern of >>> indefinitely overwriting the same value)? I.E. you're deleting clusters >>> off of a partition. A tombstone isn't any more costly, and in some ways >>> less costly than a normal column (it's a smaller size at rest than, say, >>> inserting an empty string or other default value as someone suggested). >>> >>> Tombstones stay around a little longer post-compaction than other >>> values, so that's a downside, but they also would drop off the record as if >>> it had never been set on the next compaction after gc grace period. >>> >>> Tombstones aren't intrinsically bad, but they can have some bad >>> properties in certain situations. This doesn't strike me as one of them. >>> If you have a way to avoid inserting null when you know you aren't >>> occluding an underlying value, that would be ideal. But because the >>> tombstone would sit adjacent on disk to other values from the same insert, >>> even if you were on platters, the drive head is *already positioned* over >>> the tombstone location when it's read, because it read the prior value and >>> subsequent value which were written during the same insert. >>> >>> In the end, inserting a tombstone into a non-clustered column shouldn't >>> be appreciably worse (if it is at all) than inserting a value instead. Or >>> am I missing something here? >>> >>> On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson < >>> matt.john...@algomi.com> wrote: >>> >>>> Thank you all for the advice! >>>> >>>> >>>> >>>> I have decided to use the Insert query builder ( >>>> *com.datastax.driver.core.querybuilder.Insert*) which allows me to >>>> dynamically insert as many or as few columns as I need, and doesn’t require >>>> multiple prepared statements. Then, I will look at Ali’s suggestion – I >>>> will create a small helper method like ‘addToInsertIfNotNull’ and pump all >>>> my values into that, which will then filter out the ones that are null. >>>> Should keep the code nice and neat – I will feed back if I find any >>>> problems with this approach (but please jump in if you have already spotted >>>> any :)). >>>> >>>> >>>> >>>> Thanks! >>>> >>>> Matt >>>> >>>> >>>> >>>> *From:* Robert Wille [mailto:rwi...@fold3.com] >>>> *Sent:* 29 April 2015 15:16 >>>> *To:* user@cassandra.apache.org >>>> *Subject:* Re: Inserting null values >>>> >>>> >>>> >>>> I’ve come across the same thing. I have a table with at least half a >>>> dozen columns that could be null, in any combination. Having a prepared >>>> statement for each permutation of null columns just isn’t going to happen. >>>> I don’t want to build custom queries each time because I have a really cool >>>> system of managing my queries that relies on them being prepared. >>>> >>>> >>>> >>>> Fortunately for me, I should have at most a handful of tombstones in >>>> each partition, and most of my records are written exactly once. So, I just >>>> let the tombstones get written and they’ll eventually get compacted out and >>>> life will go on. >>>> >>>> >>>> >>>> It’s annoying and not ideal, but what can you do? >>>> >>>> >>>> >>>> On Apr 29, 2015, at 2:36 AM, Matthew Johnson <matt.john...@algomi.com> >>>> wrote: >>>> >>>> >>>> >>>> Hi all, >>>> >>>> >>>> >>>> I have some fields that I am storing into Cassandra, but some of them >>>> could be null at any given point. As there are quite a lot of them, it >>>> makes the code much more readable if I don’t check each one for null before >>>> adding it to the INSERT. >>>> >>>> >>>> >>>> I can see a few Jiras around CQL 3 supporting inserting nulls: >>>> >>>> >>>> >>>> https://issues.apache.org/jira/browse/CASSANDRA-3783 >>>> >>>> https://issues.apache.org/jira/browse/CASSANDRA-5648 >>>> >>>> >>>> >>>> But I have tested inserting null and it seems to work fine (when >>>> querying the table with cqlsh, it shows up as a red lowercase *null*). >>>> >>>> >>>> >>>> Are there any obvious pitfalls to look out for that I have missed? >>>> Could it be a performance concern to insert a row with some nulls, as >>>> opposed to checking the values first and inserting the row and just >>>> omitting those columns? >>>> >>>> >>>> >>>> Thanks! >>>> >>>> Matt >>>> >>>> >>>> >>> >>> >