If its a sliding 30 min window you will need to implement it and have an in-memory timestamp list but out of order messages will always be a headache. If you are ok with a fixed 30 min window (each 30 min eg 5:00, 5:30, 6:00,..) then just add a time bucket into the partition key and you are done. Our of order messages go into their time bucket partitions and thats it. No need to read before write and worry about consistency. Depends on what your requirements are.
On Sat, Mar 18, 2017 at 6:27 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > I have a use case where a stream of time series data is coming in. > > Each item in the stream has a timestamp of when it was sent, and covers > the activity that happened within a 5 minute timespan. > > I need to group the items together into 30 minute blocks of time. > > E.g, say I receive the following items: > > 5:00 PM, 5:05 PM, 5:10 PM... 5:30 PM, 6:20 PM > > I need to group the messages from 5:00 PM to 5:30 PM into one block, and > put the 6:20 PM message into another block. > > It seems simple enough to do, if for each message, I look up the last > received message. If it was within 30 minutes, then the message goes into > the current block. Otherwise, a new block is started. > > My concern is about messages that arrive out of order, or are processed > concurrently. > > Saving and reading them with Consistency=ALL would be bad for performance, > and I've had issues where queries have failed due to timeouts with those > settings (and timeouts can't be increased on a per query basis). > > Would it be better to use Redis, or another database, to use as a helper / > companion to C*? > > Or perhaps, all messages should just be stored first, and then ~30 minutes > later, a job is run which gets all messages within last 30 mins, sorts them > by time, and then sorts them into blocks of time? >