Max, 

This sounds a bit complex, what would need to happen if you didn't process an 
event (or batch of events) in time?  What about using time-based expiry for 
your events which is supported by the Bitcask backend.  You could use 
Multi-backend to setup a bucket that expires in N seconds.  When you write your 
last event in a batch write a key/value pair to the bucket that expires with 
the list of keys that was in that batch.  Make the key meaningful enough that 
your program doesn't have to look it up, it can guess it from other context.

see: http://wiki.basho.com/Bitcask.html

Automatic Expiration 
By default, Bitcask keeps all of your data around. If your data has limited 
time-value, or if for space reasons you need to purge data, you can set the 
expiry_secs option. If you needed to purge data automatically after 1 day, set 
the value to 86400.
Default is: -1 which disables automatic expiration


{bitcask, [ ..., {expiry_secs, -1}, %% Don't expire items based on time ... ]} 



@gregburd
Developer Advocate, Basho Technologies | http://basho.com | @basho


On Tuesday, May 15, 2012 at 1:56 PM, Max Ivanov wrote:

> Hi,
> 
> what's the best approach to process batch of events in N seconds after
> latest event in a group happen? Events are grouped by key.
> 
> I am thinking about following scheme:
> 
> 1) events are recorded in a way that every write creates new sibling
> to avoid read/write multiple cycles per event
> 2) with every write new secondary index is created with value =
> "sweep_at_$current_time + N"
> 3) every second process queries Riak for secondary keys with values <=
> "sweep_at_$current_time"
> 4) for every item returned it queries all it's siblings:
> - if there are siblings, then merge them into 1 record, calculate and
> write new secondary index "seep_at_$latest_sibling_time + N". Go to
> next substep if newly calculated timeout value is <= current time.
> - if there are no siblings, process them and remove key from Riak
> 
> Therefore for every batch of N events on average (given that 99% of
> event batches timespans are less than N) there will be:
> N+1 writes and 2 secondary index seek and 2 reads
> 
> Is it correct approach for Riak? It could be improved further by
> carefully setting secondary index on stage 2 so that merge of all
> sibling will be immediately followed by processing of events batch,
> but right now I am more intrested wether it fit nicely to Riak.
> 
> Thank you.
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to