Hi Jelmer, Three comments, if I understand your use case correctly…
1. I would first try using RockDB with incremental checkpointing <https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html>, before deciding that an alternative approach is required. 2. Have you considered using queryable state <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html> vs. also keeping the list of events in Cassandra? 3. Depending on what you need the list of events for, often you can apply a streaming algorithm to get good-enough (approximate) results without storing complete state. — Ken > On May 7, 2018, at 5:29 AM, jelmer <jkupe...@gmail.com> wrote: > > Hi I am looking for some advice on how to solve the following problem > > I'd like to keep track of the all time last n events received for a user. > An event on average takes up 500 bytes and here will be ten's of millions of > users for which we need to keep this information. The list of events will be > stored in cassandra for serving by an api > > One way I can think of to implement this , is to use a global window per user > with a count evictor. > > The problem I see with this is that the state would forever remain on the > worker nodes, in our case, in rocks db. > > This means there would be a *lot* of state to include for savepoints and > checkpoints. This would make such a job very unwieldy to operate. > > Is it possible to evict state from global state after some period of > inactivity. and then reinitalize the global state with data loaded from > cassandra when a new event arrives ? > > Or is there an obvious better way to tackle this problem that i am missing > > Any pointers would be greatly appreciated > > -------------------------------------------- http://about.me/kkrugler +1 530-210-6378