Hi Anton,
I think I start to understand what you want to achieve. I would be a
very interesting thing to do ;)
If the drools are the piece that decides what is interesting and what is
not, then probably also drools should be responisble for keeping
state/aggregates - otherwise
Flink would keep too much stuff in memory.
I think that serializing session wouldn't be impossible. I still wonder
how to deal with clocks in Drools CEP - maybe the clock should be moved
by incoming events - but then there are problems with out of order ones,
or only by watermarks - but then you have worse latency.
One thing that is still not so clear to me is - how do you want to
partition the stream of tweets? I guess that it cannot be done by some
tweet property - e.g. author, because otherwise users would have many
sessions and aggregations would not be correct?
I may be wrong, but I think that to leverage Flink scaling/state
capabilities the processing logic should be somehow Flink-aware - that
is, Flink should understand (at least partially) what are specific rules...
maciek
On 23/06/2016 16:54, Anton wrote:
Hi Maciek
Firstly, thanks for your replies and interest. It is really appreciated. I
am familiar with Drools but am new to Flink. Whereas you seem familiar with
both - so your feedback is really appreciated.
On Thu, Jun 23, 2016 at 8:30 AM, Maciek Próchniak <m...@touk.pl> wrote:
you mean - keeping working memory facts in Flink state and with each event
throw them into stateless session?
Yes kind of.
Can you elaborate a bit on your use case? What would be the state?
*Use case*
Flink ingests a large amount of tweets.
Application users can write rules to reason of tweet contents and
frequency.
For example, user John writes a rule that says:
*When I receive 3 tweets about Apache Flink within 1 day*
*Then Flink is popular*
*Fire new Popular("Flink") event*
*When I have not received a tweet about Apache Attic project ___ in one
month*
*then this project is not popular*
*Fire new UnPopular("Attic project") event*
Another important aspect of the use-case is the frequency of rule
changes/updates. There will be many users writing rules to reason over
incoming events - and these rules are expected to change often.
The main goal is to make Drools CEP stateful. Part of the challenge is
ensuring that new messages are routed to the correct drools session. But,
as Flink already seems to have scaling and state well done - it would be
nice to combine the two.
Please have a look at
http://blog.athico.com/2016/03/high-availability-drools-stateless.html
Also, I cannot comment on the strengths and weaknesses of Flink CEP, but
Drools CEP is certainly very mature - and when combined with the rule
language - very powerful. To be able to combine the statefulness of Flink
with Drools would be amazing!
The concern with using Stateful sessions is the possibility of memory
leaks. However, a stateless session is just a wrapper around stateless -
which calls dispose. So this makes me think, perhaps, when populating the
working memory, just extract all relevant events from the Flink state into
the Drools WM - and it could even be stateful - and then call displose
after calling fireAllRules. And do this each time a new message/event
arrives.
Would it be for example some event aggregations, which would be filtered by
the rules?
Yes.