Thanks for going through it Gyula. I've made the necessary amends to the timeline and submitted the proposal.
Regards, Akshay Dixit On Thu, Mar 26, 2015 at 8:53 PM, Gyula Fóra <gyf...@apache.org> wrote: > I think it looks good for a start, we will have to work on the API a little > bit together to make it fit smoothly with what we currently have. > > There is a few gaps in the timeline but that you have probably noticed :) > > Otherwise +1 from me. > > On Wed, Mar 25, 2015 at 11:35 PM, Akshay Dixit <akshayd...@gmail.com> > wrote: > > > Hi, > > The link to the draft proposal that I've prepared is > > https://gist.github.com/akshaydixi/88f3fbcebab0119a6a31 > > It would be great if I could get some feedback on it. > > Regards, > > Akshay Dixit > > > > On Wed, Mar 25, 2015 at 2:03 AM, Akshay Dixit <akshayd...@gmail.com> > > wrote: > > > > > Thanks Gyula. > > > > > > I agree too that simple and working implementations are preferrable > over > > > hacky complex solutions. I'll start sketching out an initial > > straighforward > > > API with only basic pattern matching features > > > and base it on the existing windowing API. I'll post a draft of the > > > proposal, keeping the points you've said in mind, tomorrow, so you can > > > look it over to see if its all right. > > > Regards, > > > Akshay Dixit > > > > > > On Tue, Mar 24, 2015 at 6:30 PM, Gyula Fóra <gyf...@apache.org> wrote: > > > > > >> Hey Dixit, > > >> > > >> Sorry for the delay, I had to discuss this in more detail with some of > > our > > >> other core developers. > > >> > > >> The consensus seems to be that we would like push this project in a > > >> direction where the changes can be quickly included in the next > > releases. > > >> For this it is essential that we implement features that are complete > > (and > > >> clean) from the users perspective. This does not necessarily mean that > > we > > >> would like to have everything at once but rather that it is preferable > > to > > >> start with something clean and simple (for instance the naive chained > > >> filter approach) and progressively build more complex logic. > > >> > > >> This also mean that we would like to avoid "researchy" code in the > > >> codebase > > >> as much as possible. Of course once we have a stable api for this > > >> functionality we can work towards making the optimizations that you > have > > >> mentioned like operator sharing and so on. > > >> > > >> The ideal proposal would give a clear sketch of the pattern matching > API > > >> that you would like to implement, which might be some added operators > at > > >> first to the current API and possible a DSL later with more advanced > > >> functionality (this would probably go in a separate library until it > is > > >> very stable). > > >> > > >> So please in the proposal include a preview of what the pattern > matching > > >> syntax would look like integrated with the current operators, how it > > would > > >> interact with other parts of the system etc. > > >> > > >> These are the thing we need to figure out before we consider the > > >> optimizations I think, because it usually turns out, that the API > > >> semantics > > >> you would like to provide can hugely affect (probably limit) the > > >> possibilities that you have afterwards in terms of optimizations. > > >> > > >> Let me know if you have further questions regarding this :) > > >> > > >> Gyula > > >> > > >> On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra <gyf...@apache.org> > wrote: > > >> > > >> > Hey, > > >> > > > >> > Give me an hour or so as I am in a meeting currently, but I will get > > >> back > > >> > to you afterwards. > > >> > > > >> > Regards, > > >> > Gyula > > >> > > > >> > On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit < > akshayd...@gmail.com> > > >> > wrote: > > >> > > > >> >> Hi, > > >> >> It'd really help if I got a reply soon. It'll be helpful in writing > > the > > >> >> proposal since the deadline is on 27th. Thanks > > >> >> Regards, > > >> >> Akshay Dixit > > >> >> > > >> >> On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit < > akshayd...@gmail.com> > > >> >> wrote: > > >> >> > > >> >> > Thanks for the explanation Marton. I've decided to try out for > > >> >> FLINK-1534. > > >> >> > > > >> >> > After reading through the thesis[4] and a few other > > papers[1][2][3], > > >> I > > >> >> > believe I've gathered a little context to ask more questions. But > > I'm > > >> >> still > > >> >> > not sure how Flink's internals work > > >> >> > so please bear with me. Although the ongoing effort to document > the > > >> >> > architecture and internal is really helpful for newbies like me > and > > >> >> would > > >> >> > greatly decrease the ramping up time. > > >> >> > > > >> >> > Detecting a pattern of events would comprise of a pipeline that > > >> accepts > > >> >> > the pattern query and > > >> >> > sources of DataStreams, and outputs detected matches of that > > pattern > > >> to > > >> >> a > > >> >> > sink or forwards it > > >> >> > along to another stream for further computation. > > >> >> > > > >> >> > As you said, a simple filter-join-aggregate query system could be > > >> >> > developed implementing using the existing Streaming windowing > API. > > >> >> > But matching over complex events and decoding their pattern > queries > > >> >> would > > >> >> > require implementing a DSL that transforms queries into an > > evaluation > > >> >> > model. For e.g, > > >> >> > in [1], the authors have implemented an NFA automaton with a > shared > > >> >> > versioned buffer that models the queries. In [4], the authors > > >> >> > propose a new language that is much more expressive and compiles > > >> into a > > >> >> > topology graph for Storm. > > >> >> > > > >> >> > So in Flink's case, I believe the proposed DSL would generate > > >> operator > > >> >> > graphs for the Flink compiler to schedule Jobgraphs over > > >> TaskManagers. > > >> >> > If we don't depend on the Windowing API, would we need to create > > new > > >> >> > operators such as the Projection, Conjunction and Union operators > > >> >> defined > > >> >> > in [4] ? > > >> >> > Also I would like to hear your thoughts on how to approach > scaling > > >> the > > >> >> > pattern matching query. Note all these techniques talk about > > scaling > > >> a > > >> >> > single query. > > >> >> > I've read various ways such as > > >> >> > > > >> >> > 1. Merging equivalent runs[1] -: This seems a good way to squash > > >> >> multiple > > >> >> > instances of pattern matching forks into a single one if they > have > > >> the > > >> >> same > > >> >> > state. > > >> >> > But I'm not sure how we would implement this in Flink since this > > is a > > >> >> > runtime optimization. > > >> >> > > > >> >> > 2. Implementing a matched version buffer[1] -: This would > involve > > >> >> sharing > > >> >> > state of a buffer datastructure across multiple candidate match > > >> >> instances > > >> >> > for the pattern. > > >> >> > > > >> >> > 3. Splitting complex composite patterns into simpler > > sub-patterns[4] > > >> >> and > > >> >> > executing separate queries to detect those sub-patterns. This > might > > >> >> > translate into different > > >> >> > tasks and duplicating the source datastreams to all the new > > generated > > >> >> > tasks. > > >> >> > > > >> >> > Also since I don't know how the Flink compiler behaves, would > some > > of > > >> >> the > > >> >> > optimizations involve making changes to it too? > > >> >> > > > >> >> > Regards, > > >> >> > Akshay Dixit > > >> >> > > > >> >> > [1] : Efficient Pattern Matching over Event Streams > > >> >> > < > > >> > http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf > > >> >> > > > >> >> > [2] : On Supporting Kleene Closure over Event Streams > > >> >> > <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf > > > > >> >> > [3] : Processing Flows of Information: From Data Stream to > Complex > > >> Event > > >> >> > Processing > > >> >> > < > > >> >> > > >> > > > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf > > >> >> > > > >> >> > [4] : Distributing Complex Event Detection > > >> >> > < > > >> >> > > >> > http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf > > > > > >> >> > > > >> >> > On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi < > > >> >> balassi.mar...@gmail.com> > > >> >> > wrote: > > >> >> > > > >> >> >> Dear Akshay, > > >> >> >> > > >> >> >> Thanks again for your interest and for the recent contribution > to > > >> >> >> streaming. > > >> >> >> > > >> >> >> Both of the projects mentioned wold be largely appreciated by > the > > >> >> >> community, and you can also propose other project suggestions > here > > >> for > > >> >> >> discussion. > > >> >> >> > > >> >> >> Regarding FLINK-1534, the thesis I mentioned serves as a > starting > > >> point > > >> >> >> and > > >> >> >> indeed the basic solution can be implemented with filtering and > > >> >> >> windowing/mapping with some state storing whether the cause of > an > > >> event > > >> >> >> has > > >> >> >> been already seen. Solely relying on the now existing windowing > > API > > >> >> this > > >> >> >> however might cause performance issues if the events also have > an > > >> >> >> expiration timeout - some optimization there would be included. > > The > > >> >> >> further > > >> >> >> challenge is to try to further exploit the parallel job > execution > > of > > >> >> Flink > > >> >> >> to possibly scale a pattern matching query. > > >> >> >> > > >> >> >> Best, > > >> >> >> > > >> >> >> Marton > > >> >> >> > > >> >> >> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit < > > akshayd...@gmail.com > > >> > > > >> >> >> wrote: > > >> >> >> > > >> >> >> > Hi, > > >> >> >> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, > India. > > >> I'm > > >> >> >> > currently interested in distributed systems and stream > > processing > > >> >> and am > > >> >> >> > looking to delve deeper into the subject, and hope to get some > > >> >> insight > > >> >> >> by > > >> >> >> > contributing to Apache Flink. I've gathered some idea of the > > >> >> >> > flink-streaming codebase by recently working on a PR for > > >> >> FLINK-1450[2]. > > >> >> >> > > > >> >> >> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects > > >> that I > > >> >> >> would > > >> >> >> > love to work on over the summer. I was wondering which amongst > > >> these > > >> >> >> would > > >> >> >> > be more appreciated by the community, so I can start working > > >> towards > > >> >> a > > >> >> >> > proposal for either one. > > >> >> >> > > > >> >> >> > Regarding FLINK-1534, I was wondering why would simply merging > > and > > >> >> >> > filtering the existing streams for events we want to detect > not > > >> work? > > >> >> >> Also > > >> >> >> > on going through the document mentioned by @mbalassi in the > JIRA > > >> >> >> > comment[5], the authors specify some Runtime Event Detection > > >> >> concepts in > > >> >> >> > Section 5.2. I'm assuming the project entails on building a > > >> similar > > >> >> >> analogy > > >> >> >> > using Flink and the deliverables would include working pattern > > >> >> matching > > >> >> >> > operators over Flink DataStreams as described in the report. > If > > >> so, > > >> >> then > > >> >> >> > shouldn't it be trivial to implement the described the Binary > > >> >> operator > > >> >> >> > using a WindowedStream and a Filter? > > >> >> >> > I hope my questions don't seem misplaced here and I would > > >> appreciate > > >> >> >> links > > >> >> >> > to literature where I can learn more on the topic. > > >> >> >> > > > >> >> >> > Regards, > > >> >> >> > Akshay Dixit > > >> >> >> > > > >> >> >> > [1] : http://akshaydixi.me > > >> >> >> > [2] : https://github.com/apache/flink/pull/481 > > >> >> >> > [3] : https://issues.apache.org/jira/browse/FLINK-1617 > > >> >> >> > [4] : https://issues.apache.org/jira/browse/FLINK-1534 > > >> >> >> > [5] : > > >> >> >> > > > >> >> > > >> > http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf > > >> >> >> > > > >> >> >> > > >> >> > > > >> >> > > > >> >> > > >> > > > >> > > > >> > > > > > > > > >