Tombstone management is a big conversation, you can manage it in one of the
following ways

1) set a gc_grace_seconds of 0 and then run nodetool compact while using
size tiered compaction..as frequently as needed. This often is a pretty
lousy solution as gc_grace_seconds means you're not very partition tolerant
and it's easy to bring data back from the dead if you don't manage how you
bring nodes back online correctly. Also..nodetool compact is super
intensive. I don't recommend this approach unless you're already very
operationally sound.
2)Partition your data using a scheme that matches your domain model. It
sounds like you're using a queue approach and by and large  a distributed
database that relies on tombstones is going to struggle with that by
default. I have however, worked with a number of customers that use
cassandra for a queue at scale and I detailed the modeling workarounds here
http://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/

On Tue, Jan 6, 2015 at 4:24 AM, Jens-U. Mozdzen <jmozd...@nde.ag> wrote:

> Hi Eduardo,
>
> Zitat von Eduardo Cusa <eduardo.c...@usmediaconsulting.com>:
>
>>  [...]
>> I have to worry about the tombstones generated?  Considering that I will
>> have many daily set updates
>>
>
> that depends on your definition of "many"... we've run into a situation
> where we wanted to age out old data using TTL... unfortunately, we ran into
> the "tombstone_failure_threshold" limit rather quickly, having thousands of
> record updates per second. That left us with a CF containing millions of
> records that we couldn't "select" the way we originally intended.
>
> Regards,
> Jens
>
>


-- 

Thanks,
Ryan Svihla

Reply via email to