Hi Kevin, I'm happy to hear that Flink performs well for your use-cases!
I'm not sure if I understand what you mean by a metadata driven window trigger. What is an example of a metadata that would trigger a window? Why would you need global state to filter duplicates from a stream? I assume that you can just partition the stream and keep the elements you've already seen in the local state? Have you seen this Flink Improvement Proposal https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata and the associated discussion thread? I'm not sure if that's covering your use case. Regards, Robert On Fri, Aug 12, 2016 at 9:45 AM, Kevin Jacobs <kevin.jac...@cern.ch> wrote: > Hi, > > Today I will be giving a presentation about Apache Flink and in terms of > the use cases at my company, Apache Flink performs better than Apache > Spark. There is only one issue I encountered, and that is the lack of > support for (Meta)data Driven Window Triggers. > > I would like to start a discussion on this. In my opinion, it is fairly > easy to implemented such a thing as Metadata Driven Window Triggers by > making use of the state mechanism implemented in Apache Flink. > > Most of the time, the global state is just a small subset of the data of a > stream/streams. One needs to take care of only a few fields of the original > stream. So in that sense, a StateExtractor class, could extract the > necessary fields from the original stream(s) and store them in a global > state. Then, a (Meta)data Driven Window Trigger is straightforward to > implement, since it can make use of the elements collected by the > StateExtractor. > > One such a use case in which (Meta)data Driven Window Triggers could be > useful is for example filtering duplicates from a stream. > > Just my idea :-), what are your thoughts? > > Regards, > Kevin > >