Hi, The link to the draft proposal that I've prepared is https://gist.github.com/akshaydixi/88f3fbcebab0119a6a31 It would be great if I could get some feedback on it. Regards, Akshay Dixit
On Wed, Mar 25, 2015 at 2:03 AM, Akshay Dixit <akshayd...@gmail.com> wrote: > Thanks Gyula. > > I agree too that simple and working implementations are preferrable over > hacky complex solutions. I'll start sketching out an initial straighforward > API with only basic pattern matching features > and base it on the existing windowing API. I'll post a draft of the > proposal, keeping the points you've said in mind, tomorrow, so you can > look it over to see if its all right. > Regards, > Akshay Dixit > > On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gyf...@apache.org> wrote: > >> Hey Dixit, >> >> Sorry for the delay, I had to discuss this in more detail with some of our >> other core developers. >> >> The consensus seems to be that we would like push this project in a >> direction where the changes can be quickly included in the next releases. >> For this it is essential that we implement features that are complete (and >> clean) from the users perspective. This does not necessarily mean that we >> would like to have everything at once but rather that it is preferable to >> start with something clean and simple (for instance the naive chained >> filter approach) and progressively build more complex logic. >> >> This also mean that we would like to avoid "researchy" code in the >> codebase >> as much as possible. Of course once we have a stable api for this >> functionality we can work towards making the optimizations that you have >> mentioned like operator sharing and so on. >> >> The ideal proposal would give a clear sketch of the pattern matching API >> that you would like to implement, which might be some added operators at >> first to the current API and possible a DSL later with more advanced >> functionality (this would probably go in a separate library until it is >> very stable). >> >> So please in the proposal include a preview of what the pattern matching >> syntax would look like integrated with the current operators, how it would >> interact with other parts of the system etc. >> >> These are the thing we need to figure out before we consider the >> optimizations I think, because it usually turns out, that the API >> semantics >> you would like to provide can hugely affect (probably limit) the >> possibilities that you have afterwards in terms of optimizations. >> >> Let me know if you have further questions regarding this :) >> >> Gyula >> >> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gyf...@apache.org> wrote: >> >> > Hey, >> > >> > Give me an hour or so as I am in a meeting currently, but I will get >> back >> > to you afterwards. >> > >> > Regards, >> > Gyula >> > >> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit <akshayd...@gmail.com> >> > wrote: >> > >> >> Hi, >> >> It'd really help if I got a reply soon. It'll be helpful in writing the >> >> proposal since the deadline is on 27th. Thanks >> >> Regards, >> >> Akshay Dixit >> >> >> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <akshayd...@gmail.com> >> >> wrote: >> >> >> >> > Thanks for the explanation Marton. I've decided to try out for >> >> FLINK-1534. >> >> > >> >> > After reading through the thesis[4] and a few other papers[1][2][3], >> I >> >> > believe I've gathered a little context to ask more questions. But I'm >> >> still >> >> > not sure how Flink's internals work >> >> > so please bear with me. Although the ongoing effort to document the >> >> > architecture and internal is really helpful for newbies like me and >> >> would >> >> > greatly decrease the ramping up time. >> >> > >> >> > Detecting a pattern of events would comprise of a pipeline that >> accepts >> >> > the pattern query and >> >> > sources of DataStreams, and outputs detected matches of that pattern >> to >> >> a >> >> > sink or forwards it >> >> > along to another stream for further computation. >> >> > >> >> > As you said, a simple filter-join-aggregate query system could be >> >> > developed implementing using the existing Streaming windowing API. >> >> > But matching over complex events and decoding their pattern queries >> >> would >> >> > require implementing a DSL that transforms queries into an evaluation >> >> > model. For e.g, >> >> > in [1], the authors have implemented an NFA automaton with a shared >> >> > versioned buffer that models the queries. In [4], the authors >> >> > propose a new language that is much more expressive and compiles >> into a >> >> > topology graph for Storm. >> >> > >> >> > So in Flink's case, I believe the proposed DSL would generate >> operator >> >> > graphs for the Flink compiler to schedule Jobgraphs over >> TaskManagers. >> >> > If we don't depend on the Windowing API, would we need to create new >> >> > operators such as the Projection, Conjunction and Union operators >> >> defined >> >> > in [4] ? >> >> > Also I would like to hear your thoughts on how to approach scaling >> the >> >> > pattern matching query. Note all these techniques talk about scaling >> a >> >> > single query. >> >> > I've read various ways such as >> >> > >> >> > 1. Merging equivalent runs[1] -: This seems a good way to squash >> >> multiple >> >> > instances of pattern matching forks into a single one if they have >> the >> >> same >> >> > state. >> >> > But I'm not sure how we would implement this in Flink since this is a >> >> > runtime optimization. >> >> > >> >> > 2. Implementing a matched version buffer[1] -: This would involve >> >> sharing >> >> > state of a buffer datastructure across multiple candidate match >> >> instances >> >> > for the pattern. >> >> > >> >> > 3. Splitting complex composite patterns into simpler sub-patterns[4] >> >> and >> >> > executing separate queries to detect those sub-patterns. This might >> >> > translate into different >> >> > tasks and duplicating the source datastreams to all the new generated >> >> > tasks. >> >> > >> >> > Also since I don't know how the Flink compiler behaves, would some of >> >> the >> >> > optimizations involve making changes to it too? >> >> > >> >> > Regards, >> >> > Akshay Dixit >> >> > >> >> > [1] : Efficient Pattern Matching over Event Streams >> >> > < >> http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf >> >> > >> >> > [2] : On Supporting Kleene Closure over Event Streams >> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf> >> >> > [3] : Processing Flows of Information: From Data Stream to Complex >> Event >> >> > Processing >> >> > < >> >> >> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf >> >> > >> >> > [4] : Distributing Complex Event Detection >> >> > < >> >> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf> >> >> > >> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi < >> >> balassi.mar...@gmail.com> >> >> > wrote: >> >> > >> >> >> Dear Akshay, >> >> >> >> >> >> Thanks again for your interest and for the recent contribution to >> >> >> streaming. >> >> >> >> >> >> Both of the projects mentioned wold be largely appreciated by the >> >> >> community, and you can also propose other project suggestions here >> for >> >> >> discussion. >> >> >> >> >> >> Regarding FLINK-1534, the thesis I mentioned serves as a starting >> point >> >> >> and >> >> >> indeed the basic solution can be implemented with filtering and >> >> >> windowing/mapping with some state storing whether the cause of an >> event >> >> >> has >> >> >> been already seen. Solely relying on the now existing windowing API >> >> this >> >> >> however might cause performance issues if the events also have an >> >> >> expiration timeout - some optimization there would be included. The >> >> >> further >> >> >> challenge is to try to further exploit the parallel job execution of >> >> Flink >> >> >> to possibly scale a pattern matching query. >> >> >> >> >> >> Best, >> >> >> >> >> >> Marton >> >> >> >> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <akshayd...@gmail.com >> > >> >> >> wrote: >> >> >> >> >> >> > Hi, >> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. >> I'm >> >> >> > currently interested in distributed systems and stream processing >> >> and am >> >> >> > looking to delve deeper into the subject, and hope to get some >> >> insight >> >> >> by >> >> >> > contributing to Apache Flink. I've gathered some idea of the >> >> >> > flink-streaming codebase by recently working on a PR for >> >> FLINK-1450[2]. >> >> >> > >> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects >> that I >> >> >> would >> >> >> > love to work on over the summer. I was wondering which amongst >> these >> >> >> would >> >> >> > be more appreciated by the community, so I can start working >> towards >> >> a >> >> >> > proposal for either one. >> >> >> > >> >> >> > Regarding FLINK-1534, I was wondering why would simply merging and >> >> >> > filtering the existing streams for events we want to detect not >> work? >> >> >> Also >> >> >> > on going through the document mentioned by @mbalassi in the JIRA >> >> >> > comment[5], the authors specify some Runtime Event Detection >> >> concepts in >> >> >> > Section 5.2. I'm assuming the project entails on building a >> similar >> >> >> analogy >> >> >> > using Flink and the deliverables would include working pattern >> >> matching >> >> >> > operators over Flink DataStreams as described in the report. If >> so, >> >> then >> >> >> > shouldn't it be trivial to implement the described the Binary >> >> operator >> >> >> > using a WindowedStream and a Filter? >> >> >> > I hope my questions don't seem misplaced here and I would >> appreciate >> >> >> links >> >> >> > to literature where I can learn more on the topic. >> >> >> > >> >> >> > Regards, >> >> >> > Akshay Dixit >> >> >> > >> >> >> > [1] : http://akshaydixi.me >> >> >> > [2] : https://github.com/apache/flink/pull/481 >> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617 >> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534 >> >> >> > [5] : >> >> >> > >> >> >> http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf >> >> >> > >> >> >> >> >> > >> >> > >> >> >> > >> > >> > >