Re: Best practice for sorting on frequent updated column?

Eric Stevens Mon, 29 Dec 2014 05:51:24 -0800

This is a bit difficult.  Depending on your access patterns and data
volume, I'd be inclined to keep a separate table with a (count,
foreign_key) clustering key.  Then do a client-side join to read the data
back in the order you're looking for.  That will at least make the heavily
updated table have a much smaller cost to update, but at the cost of
impacting read time.  At least related values that haven't changed don't
need to be deleted and inserted again each time this one value changes.

But like you said, this is a read-then-write operation, over time you'll
accumulate a lot of tombstones, and your data may suffer accuracy.  I would
also recommend rotating your partition keys and have a background process
that trues up your object-by-count table into a new partition key on some
schedule you determine.  Live updates write to partition key *n*, and *n*+1,
and your truing up process trues up *n*+1, before your read process changes
changes to reading from *n*+1.  When all readers are done with *n*, you can
delete the whole row, and because nobody is reading from that row any
longer, it doesn't matter how many tombstones it accumulated.  I suggest
using a timestamp for the partition key so it's easy to reason about, and
you can rotate it on a schedule that makes sense for you.

If there's heavy write contention, your data will end up being always off
by a little bit (due to race conditions between the truing up process and
the live process), but will correct itself over time.

On Sat, Dec 27, 2014 at 10:15 AM, ziju feng <pkdog...@gmail.com> wrote:

> I need to sort data on a frequent updated column, such as like count of an
> item. The common way of getting data sorted in Cassandra is to have the
> column to be sorted on as clustering key. However, whenever such column is
> updated, we need to delete the row of old value and insert the new one,
> which not only can generate a lot of tombstones, but also require a
> read-before-write if we don't know the original value (such as using
> counter table to maintain the count and propagate it to the table that
> needs to sort on the count).
>
> I was wondering what is best practice for such use case? I'm currently
> using DSE search to handle it but I would like to see a Cassandra only
> solution.
>
> Thanks.
>

Re: Best practice for sorting on frequent updated column?

Reply via email to