Re: Which compaction strategy when modeling a dumb set

Vincent Rischmann Mon, 27 Feb 2017 04:55:11 -0800

No I don't store events in Cassandra.



The real thing I'm doing is couting stuff: each event has a type, a user
associated with it, some other metadata. When I process an event I need
to increment those counters only if the event hasn't already been
processed. Our input event stream is Kafka and it's not uncommon that we
get the same event twice, due to our clients app not being reliable.


Right now I haven't found a good solution to this that doesn't involve a
read before write, but I'd love to hear your suggestions




On Mon, Feb 27, 2017, at 12:01 PM, Vladimir Yudovin wrote:

> Do you also store events in Cassandra? If yes, why not to add
> "processed" flag to existing table(s), and fetch non-processed events
> with single SELECT?
> 

> Best regards, Vladimir Yudovin, 

> *Winguzone[1] - Cloud Cassandra Hosting*

> 

> 

> ---- On Fri, 24 Feb 2017 06:24:09 -0500 *Vincent Rischmann
> <m...@vrischmann.me>* wrote ----
> 

>> Hello,

>> 

>> I'm using a table like this:

>> 

>>    CREATE TABLE myset (id uuid PRIMARY KEY)

>> 

>> which is basically a set I use for deduplication, id is a unique id
>> for an event, when I process the event I insert the id, and before
>> processing I check if it has already been processed for
>> deduplication.
>> 

>> It works well enough, but I'm wondering which compaction strategy I
>> should use. I expect maybe 1% or less of events will end up
>> duplicated (thus not generating an insert), so the workload will
>> probably be 50% writes 50% read.
>> 

>> Is LCS a good strategy here or should I stick with STCS ?

> 




Links:

  1. https://winguzone.com?from=list

Re: Which compaction strategy when modeling a dumb set

Reply via email to