Hi, It'd really help if I got a reply soon. It'll be helpful in writing the proposal since the deadline is on 27th. Thanks Regards, Akshay Dixit
On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <akshayd...@gmail.com> wrote: > Thanks for the explanation Marton. I've decided to try out for FLINK-1534. > > After reading through the thesis[4] and a few other papers[1][2][3], I > believe I've gathered a little context to ask more questions. But I'm still > not sure how Flink's internals work > so please bear with me. Although the ongoing effort to document the > architecture and internal is really helpful for newbies like me and would > greatly decrease the ramping up time. > > Detecting a pattern of events would comprise of a pipeline that accepts > the pattern query and > sources of DataStreams, and outputs detected matches of that pattern to a > sink or forwards it > along to another stream for further computation. > > As you said, a simple filter-join-aggregate query system could be > developed implementing using the existing Streaming windowing API. > But matching over complex events and decoding their pattern queries would > require implementing a DSL that transforms queries into an evaluation > model. For e.g, > in [1], the authors have implemented an NFA automaton with a shared > versioned buffer that models the queries. In [4], the authors > propose a new language that is much more expressive and compiles into a > topology graph for Storm. > > So in Flink's case, I believe the proposed DSL would generate operator > graphs for the Flink compiler to schedule Jobgraphs over TaskManagers. > If we don't depend on the Windowing API, would we need to create new > operators such as the Projection, Conjunction and Union operators defined > in [4] ? > Also I would like to hear your thoughts on how to approach scaling the > pattern matching query. Note all these techniques talk about scaling a > single query. > I've read various ways such as > > 1. Merging equivalent runs[1] -: This seems a good way to squash multiple > instances of pattern matching forks into a single one if they have the same > state. > But I'm not sure how we would implement this in Flink since this is a > runtime optimization. > > 2. Implementing a matched version buffer[1] -: This would involve sharing > state of a buffer datastructure across multiple candidate match instances > for the pattern. > > 3. Splitting complex composite patterns into simpler sub-patterns[4] and > executing separate queries to detect those sub-patterns. This might > translate into different > tasks and duplicating the source datastreams to all the new generated > tasks. > > Also since I don't know how the Flink compiler behaves, would some of the > optimizations involve making changes to it too? > > Regards, > Akshay Dixit > > [1] : Efficient Pattern Matching over Event Streams > <http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf> > [2] : On Supporting Kleene Closure over Event Streams > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf> > [3] : Processing Flows of Information: From Data Stream to Complex Event > Processing > <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf> > [4] : Distributing Complex Event Detection > <http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf> > > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <balassi.mar...@gmail.com> > wrote: > >> Dear Akshay, >> >> Thanks again for your interest and for the recent contribution to >> streaming. >> >> Both of the projects mentioned wold be largely appreciated by the >> community, and you can also propose other project suggestions here for >> discussion. >> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting point >> and >> indeed the basic solution can be implemented with filtering and >> windowing/mapping with some state storing whether the cause of an event >> has >> been already seen. Solely relying on the now existing windowing API this >> however might cause performance issues if the events also have an >> expiration timeout - some optimization there would be included. The >> further >> challenge is to try to further exploit the parallel job execution of Flink >> to possibly scale a pattern matching query. >> >> Best, >> >> Marton >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <akshayd...@gmail.com> >> wrote: >> >> > Hi, >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm >> > currently interested in distributed systems and stream processing and am >> > looking to delve deeper into the subject, and hope to get some insight >> by >> > contributing to Apache Flink. I've gathered some idea of the >> > flink-streaming codebase by recently working on a PR for FLINK-1450[2]. >> > >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I >> would >> > love to work on over the summer. I was wondering which amongst these >> would >> > be more appreciated by the community, so I can start working towards a >> > proposal for either one. >> > >> > Regarding FLINK-1534, I was wondering why would simply merging and >> > filtering the existing streams for events we want to detect not work? >> Also >> > on going through the document mentioned by @mbalassi in the JIRA >> > comment[5], the authors specify some Runtime Event Detection concepts in >> > Section 5.2. I'm assuming the project entails on building a similar >> analogy >> > using Flink and the deliverables would include working pattern matching >> > operators over Flink DataStreams as described in the report. If so, then >> > shouldn't it be trivial to implement the described the Binary operator >> > using a WindowedStream and a Filter? >> > I hope my questions don't seem misplaced here and I would appreciate >> links >> > to literature where I can learn more on the topic. >> > >> > Regards, >> > Akshay Dixit >> > >> > [1] : http://akshaydixi.me >> > [2] : https://github.com/apache/flink/pull/481 >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617 >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534 >> > [5] : >> > http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf >> > >> > >