Hi David, you should be able to solve this kind of problem with Flink's CEP library. The important thing here is to define a pattern interval length so that patterns can time out. Otherwise, you will end up accumulating state which is never purged. This will eventually cause an OOM exception.
How complex would a pattern be (how many stages, what kind of payload)? Depending on this, we should be able to estimate the resource requirements. Or you give it a try and see to how many machines you can minimize the cluster. Great to hear that you enjoyed the conference :-) Cheers, Till On Thu, Sep 15, 2016 at 6:13 PM, David Koch <ogd...@googlemail.com> wrote: > Hello, > > Is FlinkCEP applicable to large key spaces with potentially long timeouts > between events that define a pattern? Ideally, without ridiculous hardware. > > More concretely, we segment users (one key per user) based on sequences of > events for that user. > > A segment "Abandoned Cart" may be defined by adding items during a > browsing session but no purchase event within the following 5 days. The > number of users is between 1 and 10 million. > > Is this type of segmentation scenario a viable use case for FlinkCEP? > > We currently segment by building incremental profiles in ES which are then > "matched against segment definition queries" using ES percolators. In > short, we incur costs when interacting with ES. > > Regards, > > David > > > PS: Thanks for FlinkForward 2016, very interesting presentations and > equally important excellent catering ;-) >