Hi,
It'd really help if I got a reply soon. It'll be helpful in writing the
proposal since the deadline is on 27th. Thanks
Regards,
Akshay Dixit

On Sun, Mar 22, 2015 at 1:17 AM, Akshay Dixit <akshayd...@gmail.com> wrote:

> Thanks for the explanation Marton. I've decided to try out for FLINK-1534.
>
> After reading through the thesis[4] and a few other papers[1][2][3], I
> believe I've gathered a little context to ask more questions. But I'm still
> not sure how Flink's internals work
> so please bear with me. Although the ongoing effort to document the
> architecture and internal is really helpful for newbies like me and would
> greatly decrease the ramping up time.
>
> Detecting a pattern of events would comprise of a pipeline that accepts
> the pattern query and
> sources of DataStreams, and outputs detected matches of that pattern to a
> sink or forwards it
> along to another stream for further computation.
>
> As you said, a simple filter-join-aggregate query system could be
> developed implementing using the existing Streaming windowing API.
> But matching over complex events and decoding their pattern queries would
> require implementing a DSL that transforms queries into an evaluation
> model. For e.g,
> in [1], the authors have implemented an NFA automaton with a shared
> versioned buffer that models the queries. In [4], the authors
> propose a new language that is much more expressive and compiles into a
> topology graph for Storm.
>
> So in Flink's case, I believe the proposed DSL would generate operator
> graphs for the Flink compiler to schedule Jobgraphs over TaskManagers.
> If we don't depend on the Windowing API, would we need to create new
> operators such as the Projection, Conjunction and Union operators defined
> in [4] ?
> Also I would like to hear your thoughts on how to approach scaling the
> pattern matching query. Note all these techniques talk about scaling a
> single query.
> I've read various ways such as
>
> 1.  Merging equivalent runs[1] -: This seems a good way to squash multiple
> instances of pattern matching forks into a single one if they have the same
> state.
> But I'm not sure how we would implement this in Flink since this is a
> runtime optimization.
>
> 2.  Implementing a matched version buffer[1] -: This would involve sharing
> state of a buffer datastructure across multiple candidate match instances
> for the pattern.
>
> 3.  Splitting complex composite patterns into simpler sub-patterns[4] and
> executing separate queries to detect those sub-patterns. This might
> translate into different
> tasks and duplicating the source datastreams to all the new generated
> tasks.
>
> Also since I don't know how the Flink compiler behaves, would some of the
> optimizations involve making changes to it too?
>
> Regards,
> Akshay Dixit
>
> [1] : Efficient Pattern Matching over Event Streams
> <http://people.cs.umass.edu/~yanlei/publications/sase-sigmod08-long.pdf>
> [2] : On Supporting Kleene Closure over Event Streams
> <http://people.cs.umass.edu/~yanlei/publications/sase-icde08.pdf>
> [3] : Processing Flows of Information: From Data Stream to Complex Event
> Processing
> <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.396.1785&rep=rep1&type=pdf>
> [4] : Distributing Complex Event Detection
> <http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf>
>
> On Mon, Mar 16, 2015 at 3:22 PM, Márton Balassi <balassi.mar...@gmail.com>
> wrote:
>
>> Dear Akshay,
>>
>> Thanks again for your interest and for the recent contribution to
>> streaming.
>>
>> Both of the projects mentioned wold be largely appreciated by the
>> community, and you can also propose other project suggestions here for
>> discussion.
>>
>> Regarding FLINK-1534, the thesis I mentioned serves as a starting point
>> and
>> indeed the basic solution can be implemented with filtering and
>> windowing/mapping with some state storing whether the cause of an event
>> has
>> been already seen. Solely relying on the now existing windowing API this
>> however might cause performance issues if the events also have an
>> expiration timeout - some optimization there would be included. The
>> further
>> challenge is to try to further exploit the parallel job execution of Flink
>> to possibly scale a pattern matching query.
>>
>> Best,
>>
>> Marton
>>
>> On Sun, Mar 15, 2015 at 3:22 PM, Akshay Dixit <akshayd...@gmail.com>
>> wrote:
>>
>> > Hi,
>> > I'm Akshay Dixit[1], a 4th year undergrad at VIT Vellore, India. I'm
>> > currently interested in distributed systems and stream processing and am
>> > looking to delve deeper into the subject, and hope to get some insight
>> by
>> > contributing to Apache Flink. I've gathered some idea of the
>> > flink-streaming codebase by recently working on a PR for FLINK-1450[2].
>> >
>> > Both FLINK-1617[3] and FLINK-1534[4] are interesting projects that I
>> would
>> > love to work on over the summer. I was wondering which amongst these
>> would
>> > be more appreciated by the community, so I can start working towards a
>> > proposal for either one.
>> >
>> > Regarding FLINK-1534, I was wondering why would simply merging and
>> > filtering the existing streams for events we want to detect not work?
>> Also
>> > on going through the document mentioned by @mbalassi in the JIRA
>> > comment[5], the authors specify some Runtime Event Detection concepts in
>> > Section 5.2. I'm assuming the project entails on building a similar
>> analogy
>> > using Flink and the deliverables would include working pattern matching
>> > operators over Flink DataStreams as described in the report. If so, then
>> > shouldn't it be trivial to implement the described the Binary operator
>> > using a WindowedStream and a Filter?
>> > I hope my questions don't seem misplaced here and I would appreciate
>> links
>> > to literature where I can learn more on the topic.
>> >
>> > Regards,
>> > Akshay Dixit
>> >
>> > [1] : http://akshaydixi.me
>> > [2] : https://github.com/apache/flink/pull/481
>> > [3] : https://issues.apache.org/jira/browse/FLINK-1617
>> > [4] : https://issues.apache.org/jira/browse/FLINK-1534
>> > [5] :
>> > http://www.doc.ic.ac.uk/teaching/distinguished-projects/2012/k.nagy.pdf
>> >
>>
>
>

Reply via email to