Re: A use-case for Flink and reactive systems

2018-07-05 Thread Yersinia Ruckeri
My idea started from here: https://flink.apache.org/usecases.html
First use case describes what I am trying to realise (
https://flink.apache.org/img/usecases-eventdrivenapps.png)
My application is Flink, listening to incoming events, changing the state
of an object (really an aggregate here) and pushing out another event.
States can be persisted asynchronously; this is ok.

My point on top to this picture is that the "state" it's not just
persisting something, but retrieving the current state, manipulate it new
information and persist the updated state.


Y

On Wed, Jul 4, 2018 at 10:16 PM, Mich Talebzadeh 
wrote:

> Looks interesting.
>
> As I understand you have a microservice based on ingestion where a topic
> is defined for streaming messages that include transactional data. These
> transactions should already exist in your DB. For now we look at DB as part
> of your microservices and we take a logical view of it.
>
> So
>
>
>1. First microservice M1 provides ingestion of kafka yopic
>2. Second microservice M2 deploys Flink or Spark Streaming to
>manipulate the incoming messages. We can look at this later.
>3. Third microservice M3 consist of the database that provides current
>records for accounts identified by the account number in your message queue
>4. M3 will have to validate the incoming account number, update the
>transaction and provide completion handshake. Effectively you are providing
>DPaaS
>
>
> So far we have avoided interfaces among these services. But I gather M1
> and M2 are well established. Assuming that Couchbase is your choice of DB I
> believe it provides JDBC drivers of some sort. It does not have to be
> Couchbase. You can achieve the same with Hbase as well or MongoDB. Anyhow
> the only challenge I see here is the interface between your Flink
> application in M2 and M3
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 4 Jul 2018 at 17:56, Yersinia Ruckeri 
> wrote:
>
>> Hi all,
>>
>> I am working on a prototype which should include Flink in a reactive
>> systems software. The easiest use-case with a traditional bank system where
>> I have one microservice for transactions and another one for
>> account/balances.
>> Both are connected with Kafka.
>>
>> Transactions record a transaction and then send, via Kafka, a message
>> which include account identifer and the amount.
>> On the other microservice I want to have Flink consuming this topic and
>> updating the balance of my account based on the incoming message. it needs
>> to pull from the DB my data and make the update.
>> The DB is Couchbase.
>>
>> I spent few hours online today, but so far I only found there's no sink
>> for Couchbase, I need to build one which shouldn't be a big deal. I haven't
>> found information on how I can make Flink able to interact with a DB to
>> retrieve information and store information.
>>
>> I guess the case is a good case, as updating state in an event sourcing
>> based application is part of the first page of the manual. I am not looking
>> to just dump a state into a DB, but to interact with the DB: retrieving
>> data, elaborating them with the input coming from my queue, and persisting
>> them (especially if I want to make a second prototype using Flink CEP).
>>
>> I probably even read how to do it, but I didn't recognize.
>>
>> Can anybody help me to figure out better this part?
>>
>> Thanks,
>> Y.
>>
>


Re: [DISCUSS] Flink 1.6 features

2018-07-05 Thread Vishal Santoshi
I am not sure whether this is in any roadmap and as someone suggested
wishes are free...Tensorflow on flink though ambitious should be a big win.
I am not sure how operator isolation  for a hybrid GPU/CPU  would be
achieved and how repetitive execution could be natively supported by flink
but it seems that if as a developer looking at flink as filling the ML
void  that has emerged and thus being forced to chose between spark and
flink some futuristic announcement in the ML space is required and
tensorflow is that one marquee project that everyone wants to get a handle
on

On Wed, Jul 4, 2018, 5:10 PM Stephan Ewen  wrote:

> Hi all!
>
> A late follow-up with some thoughts:
>
> In principle, all these are good suggestions and are on the roadmap. We
> are trying to make the release "by time", meaning for it at a certain date
> (roughly in two weeks) and take what features are ready into the release.
>
> Looking at the status of the features that you suggested (please take this
> with a grain of salt, I don't know the status of all issues perfectly):
>
>   - *Proper sliding window joins:* This is happening, see
> https://issues.apache.org/jira/browse/FLINK-8478
> At least inner joins will most likely make it into 1.6. Related to
> that is enrichment joins against time versioned tables, which are being
> worked on in the Table API:
> https://issues.apache.org/jira/browse/FLINK-9712
>
>   - *Bucketing Sink on S3 and ORC / Parquet Support:* There are multiple
> efforts happening here.
> The biggest one is that the bucketing sink is getting a complete
> overhaul to support all of the mentioned features, see
> https://issues.apache.org/jira/browse/FLINK-9749
> https://issues.apache.org/jira/browse/FLINK-9752
> https://issues.apache.org/jira/browse/FLINK-9753
> Hard to say how much will make it until the feature freeze of 1.6, but
> this is happening and will be merged soon.
>
>   - *ElasticBloomFilters:* Quite a big feature, I added a reply to the
> discussion thread, looking at whether this can be realized in a more
> loosely coupled way from the core state abstraction.
>
>   - *Per partition Watermark idleness:* Noted, let's look at this more.
> There should also be a way to implement this today, with periodic
> watermarks (that realize when no records came for a while).
>
>   - *Dynamic CEP patterns:* I don't know if anyone if working on this.
>CEP is getting some love at the moment, though, with SQL integration
> and better performance on RocksDB for larger patterns. We'll take a note
> that there are various requests for dynamic patterns.
>
>   - *Kubernetes Integration:* There are some developments going on both
> for a "passive" k8s integration (jobs as docker images, Flink beine a
> transparent k8s citizen) and a more "active" integration, where Flink
> directly talks to k8s to start TaskManagers dynamically. I think the former
> has a good chance that a first version goes into 1.6, the latter needs more
> work.
>
>   - *Atomic Cancel With Savepoint:* Not enough progress on this, yet. It
> is important, but needs more work.
>   - *Synchronizing streams:* Same as above, acknowledged that this is
> important, but needs more work.
>
>   - *Standalone cluster job isolation:* No work on this, yet, as far as I
> know.
>
>   - *Sharing state across operators:* This is an interesting and tricky
> one, I left some questions on the JIRA issue
> https://issues.apache.org/jira/browse/FLINK-6239
>
> Best,
> Stephan
>
>
> On Mon, Jun 25, 2018 at 4:45 AM, zhangminglei <18717838...@163.com> wrote:
>
>> Hi, Community
>>
>> By the way, there is a very important feature I think it should be.
>> Currently, the BucketingSink does not support when a bucket is ready for
>> user use. This situation will be very obvious when flink work with offline
>> end. We called that real time/offline integration in business. In this
>> case, we should let the user can do some extra work when the bucket is
>> ready. And here is the JIRA for this
>> https://issues.apache.org/jira/browse/FLINK-9609
>>
>> Cheers
>> Minglei
>>
>> 在 2018年6月4日,下午5:21,Stephan Ewen  写道:
>>
>> Hi Flink Community!
>>
>> The release of Apache Flink 1.5 has happened (yay!) - so it is a good
>> time to start talking about what to do for release 1.6.
>>
>> *== Suggested release timeline ==*
>>
>> I would propose to release around *end of July* (that is 8-9 weeks from
>> now).
>>
>> The rational behind that: There was a lot of effort in release testing
>> automation (end-to-end tests, scripted stress tests) as part of release
>> 1.5. You may have noticed the big set of new modules under
>> "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5
>> release a bit, and needs to continue as part of the coming release cycle,
>> but should help make releasing more lightweight from now on.
>>
>> (Side note: There are also some nightly stress tests that we created and
>> run at data Artisans, and where we are looking whether and in which way it
>> wou

Slide Window Compute Optimization

2018-07-05 Thread YennieChen88
Hi,
I want to use slide windows of 1 hour window size and 1 second step
size. I found that once a element arrives, it will be processed in 3600
windows serially through one thread. It takes serveral seconds to finish one
element processing,much more than my expection. Do I have any way to
optimizate it?
Thank you very much for your reply.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: A use-case for Flink and reactive systems

2018-07-05 Thread Fabian Hueske
Hi Yersinia,

The main idea of an event-driven application is to hold the state (i.e.,
the account data) in the streaming application and not in an external
database like Couchbase.
This design is very scalable (state is partitioned) and avoids look-ups
from the external database because all state interactions are local.
Basically, you are moving the database into the streaming application.

There are a few things to consider if you maintain the state in the
application:
- You might need to mirror the state in an external database to make it
queryable and available while the streaming application is down.
- You need to have a good design to ensure that your consistency
requirements are met in case of a failure (upsert writes can temporarily
reset the external state).
- The design becomes much more challenging if you need to access the state
of two accounts to perform a transaction (subtract from the first and add
to the second account) because Flink state is distributed per key and does
not support remote lookups.

If you do not want to store the state in the Flink application, I agree
with Jörn that there's no need for Flink.

Hope this helps,
Fabian

2018-07-05 10:45 GMT+02:00 Yersinia Ruckeri :

> My idea started from here: https://flink.apache.org/usecases.html
> First use case describes what I am trying to realise (
> https://flink.apache.org/img/usecases-eventdrivenapps.png)
> My application is Flink, listening to incoming events, changing the state
> of an object (really an aggregate here) and pushing out another event.
> States can be persisted asynchronously; this is ok.
>
> My point on top to this picture is that the "state" it's not just
> persisting something, but retrieving the current state, manipulate it new
> information and persist the updated state.
>
>
> Y
>
> On Wed, Jul 4, 2018 at 10:16 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Looks interesting.
>>
>> As I understand you have a microservice based on ingestion where a topic
>> is defined for streaming messages that include transactional data. These
>> transactions should already exist in your DB. For now we look at DB as part
>> of your microservices and we take a logical view of it.
>>
>> So
>>
>>
>>1. First microservice M1 provides ingestion of kafka yopic
>>2. Second microservice M2 deploys Flink or Spark Streaming to
>>manipulate the incoming messages. We can look at this later.
>>3. Third microservice M3 consist of the database that provides
>>current records for accounts identified by the account number in your
>>message queue
>>4. M3 will have to validate the incoming account number, update the
>>transaction and provide completion handshake. Effectively you are 
>> providing
>>DPaaS
>>
>>
>> So far we have avoided interfaces among these services. But I gather M1
>> and M2 are well established. Assuming that Couchbase is your choice of DB I
>> believe it provides JDBC drivers of some sort. It does not have to be
>> Couchbase. You can achieve the same with Hbase as well or MongoDB. Anyhow
>> the only challenge I see here is the interface between your Flink
>> application in M2 and M3
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 4 Jul 2018 at 17:56, Yersinia Ruckeri 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am working on a prototype which should include Flink in a reactive
>>> systems software. The easiest use-case with a traditional bank system where
>>> I have one microservice for transactions and another one for
>>> account/balances.
>>> Both are connected with Kafka.
>>>
>>> Transactions record a transaction and then send, via Kafka, a message
>>> which include account identifer and the amount.
>>> On the other microservice I want to have Flink consuming this topic and
>>> updating the balance of my account based on the incoming message. it needs
>>> to pull from the DB my data and make the update.
>>> The DB is Couchbase.
>>>
>>> I spent few hours online today, but so far I only found there's no sink
>>> for Couchbase, I need to build one which shouldn't be a big deal. I haven't
>>> found information on how I can make Flink able to interact with a DB to
>>> retrieve information and store information.
>>>
>>> I guess the case is a good case, as updating state in an event sourcing
>>> based application is part of the first page of the manual. I am not looking
>>> to just dump a state

Re: How to implement Multi-tenancy in Flink

2018-07-05 Thread Ahmad Hassan
HI Chesnay,

Yes this is something we would eventually be doing and then maintaining the
configuration of which tenants are mapped to which flink jobs.

This would reduce the number of flinks jobs to maintain in order to support
1000s of tenants in our use case .

Thanks.

On Wed, 4 Jul 2018 at 12:00, Chesnay Schepler  wrote:

> Would it be feasible for you to partition your tenants across jobs, like
> for example 100 customers per job?
>
> On 04.07.2018 12:53, Ahmad Hassan wrote:
>
> Hi Fabian,
>
> One job per tenant model soon becomes hard to maintain. For example 1000
> tenants would require 1000 Flink and providing HA and resilience for 1000
> jobs is not so trivial solution.
>
> This is why we are hoping to get single flink job handling all the tenants
> through keyby tenant. However this also does not scale with growing number
> of tenants and putting all load on single Flink job.
>
> So I was wondering how other users are handling multitenancy in flink at
> scale.
>
> Best Regards,
>
> On Wed, 4 Jul 2018 at 11:40, Fabian Hueske  wrote:
>
>> Hi Ahmad,
>>
>> Some tricks that might help to bring down the effort per tenant if you
>> run one job per tenant (or key per tenant):
>>
>> - Pre-aggregate records in a 5 minute Tumbling window. However,
>> pre-aggregation does not work for FoldFunctions.
>> - Implement the window as a custom ProcessFunction that maintains a state
>> of 288 events and aggregates and retracts the pre-aggregated records.
>>
>> Best, Fabian
>>
>>
>> 2018-07-03 15:22 GMT+02:00 Ahmad Hassan :
>>
>>> Hi Folks,
>>>
>>> We are using Flink to capture various interactions of a customer with
>>> ECommerce store i.e. product views, orders created. We run 24 hour sliding
>>> window 5 minutes apart which makes 288 parallel windows for a single
>>> Tenant. We implement Fold Method that has various hashmaps to update the
>>> statistics of customers from the incoming Ecommerce event one by one. As
>>> soon as the event arrives, the fold method updates the statistics in
>>> hashmaps.
>>>
>>> Considering 1000 Tenants, we have two solutions in mind:
>>>
>>> !) Implement a flink job per tenant. So 1000 tenants would create 1000
>>> flink jobs
>>>
>>> 2) Implement a single flink with keyBy 'tenant' so that each tenant gets
>>> a separate window. But this will end up in creating 1000 * 288 number of
>>> windows in 24 hour period. This would cause extra load on single flink job.
>>>
>>> What is recommended approach to handle multitenancy in flink at such a
>>> big scale with over 1000 tenants while storing the fold state for each
>>> event. Solution I would require significant effort to keep track of 1000
>>> flink jobs and provide resilience.
>>>
>>> Thanks.
>>>
>>> Best Regards,
>>>
>>
>>
>


Re: Slide Window Compute Optimization

2018-07-05 Thread Kostas Kloudas
Hi,

You are correct that with sliding windows you will have 3600 “open windows” at 
any point.
Could you describe a bit more what you want to do?

If you simply want to have an update of something like a counter every second, 
then you can 
implement your own logic with a ProcessFunction that allows to handle state and 
timers in a 
custom way (see [1]).

Hope this helps,
Kostas

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/process_function.html
 

 

> On Jul 5, 2018, at 12:12 PM, YennieChen88  wrote:
> 
> Hi,
>I want to use slide windows of 1 hour window size and 1 second step
> size. I found that once a element arrives, it will be processed in 3600
> windows serially through one thread. It takes serveral seconds to finish one
> element processing,much more than my expection. Do I have any way to
> optimizate it?
>Thank you very much for your reply.
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



Re: [Table API/SQL] Finding missing spots to resolve why 'no watermark' is presented in Flink UI

2018-07-05 Thread Fabian Hueske
Hi,

Thanks for the PR! I'll have a look at it later today.

The problem of the retraction stream conversion is probably that the return
type is a Tuple2[Boolean, Row].
The boolean flag indicates whether the row is added or retracted.

Best, Fabian

2018-07-04 15:38 GMT+02:00 Jungtaek Lim :

> Thanks Fabian, filed FLINK-9742 [1].
>
> I'll submit a PR for FLINK-8094 via providing my TimestampExtractor. The
> implementation is also described as FLINK-9742. I'll start with current
> implementation which just leverages automatic cast from STRING to
> SQL_TIMESTAMP, but we could improve it from PR. Feedbacks are welcome!
>
> Btw, maybe need to initiate from another thread, but I also had to
> struggle to find a solution to convert table to retract stream. Looks like
> "implicit conversion" comes into play prior to toRetractStream and raise
> error. outTable is the result of "distinct" which looks like requiring
> retract mode. (Not even easy for me to know I should provide implicit
> TypeInformation for Row, but I'm fairly new to Scala so it's just me.)
>
> // below doesn't work as below line implicitly converts table as 'append 
> stream'
> // via org.apache.flink.table.api.scala.package$.table2RowDataStream
> // though we are calling toRetractStream
> //outTable.toRetractStream[Row](outTable.dataType).print()
>
> implicit val typeInfo = Types.ROW(outTable.getSchema.getColumnNames,
>   outTable.getSchema.getTypes)
> tableEnv.toRetractStream[Row](outTable).print()
>
>
> Thanks again,
> Jungtaek Lim (HeartSaVioR)
>
> [1] https://issues.apache.org/jira/browse/FLINK-9742
>
> 2018년 7월 4일 (수) 오후 10:03, Fabian Hueske 님이 작성:
>
>> Hi,
>>
>> Glad you could get it to work! That's great :-)
>>
>> Regarding you comments:
>>
>> 1) Yes, I think we should make resultType() public. Please open a Jira
>> issue and describe your use case.
>> Btw. would you like to contribute your TimestampExtractor to Flink (or
>> even a more generic one that allows to configure the format of the
>> timestamp string)? There is FLINK-8094 [1].
>> 2) This is "expected" because you define two different schemas, the JSON
>> schema which defines how to read the data and the Table schema that defines
>> how it is exposed to the Table API / SQL.
>>
>> Thanks, Fabian
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-8094
>>
>> 2018-07-04 14:52 GMT+02:00 Jungtaek Lim :
>>
>>> Thanks again Fabian for providing nice suggestion!
>>>
>>> Finally I got it working with applying your suggestion. Couple of tricks
>>> was needed:
>>>
>>> 1. I had to apply a hack (create new TimestampExtractor class to package
>>> org.apache.flink.blabla...) since Expression.resultType is defined as
>>> "package private" for flink. I feel adjusting scope of Explain's methods
>>> (at least resultType) to "public" would help on implementing custom
>>> TimestampExtractor in users' side: please let me know your thought about
>>> this. If you think it makes sense, I will file an issue and submit a PR, or
>>> initiate a new thread in dev mailing list to discuss it if the step is
>>> recommend.
>>>
>>> 2. To ensure KafkaTableSource's verification of rowtime field type, the
>>> type of field (here in "eventTime") should be defined as SQL_TIMESTAMP
>>> whereas the type of field in JSON should be defined as STRING.
>>>
>>> Kafka010JsonTableSource.builder()
>>>   .forTopic(topic)
>>>   .withSchema(TableSchema.builder()
>>> .field("eventTime", Types.SQL_TIMESTAMP)
>>> .build())
>>>   .forJsonSchema(TableSchema.builder()
>>> .field("eventTime", Types.STRING)
>>> .build())
>>>   .withKafkaProperties(prop)
>>>   .withRowtimeAttribute(
>>> "eventTime",
>>> new IsoDateStringAwareExistingField("eventTime"),
>>> new BoundedOutOfOrderTimestamps(Time.minutes(1).toMilliseconds)
>>>   )
>>>   .build()
>>>
>>> Thanks again!
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> 2018년 7월 4일 (수) 오후 8:18, Fabian Hueske 님이 작성:
>>>
 Hi Jungtaek,

 If it is "only" about the missing support to parse a string as
 timestamp, you could also implement a custom TimestampExtractor that works
 similar to the ExistingField extractor [1].
 You would need to adjust a few things and use the expression
 "Cast(Cast('tsString, SqlTimeTypeInfo.TIMESTAMP), Types.LONG)" to convert
 the String to a Long.
 So far this works only if the date is formatted like "2018-05-28
 12:34:56.000"

 Regarding the side outputs, these would not be handled as results but
 just redirect late records into separate data streams. We would offer a
 configuration to write them to a sink like HDFS or Kafka.

 Best, Fabian

 [1] https://github.com/apache/flink/blob/master/flink-
 libraries/flink-table/src/main/scala/org/apache/flink/
 table/sources/tsextractors/ExistingField.scala

 2018-07-04 11:54 GMT+02:00 Jungtaek Lim :

> Thanks Chesnay! Great news to hear. I'll try out with latest master
> branch.
>
> Thanks Fabian for provid

Re: A use-case for Flink and reactive systems

2018-07-05 Thread Yersinia Ruckeri
It helps, at least it's fairly clear now.
I am not against storing the state into Flink, but as per your first point,
I need to get it persisted, asynchronously, in an external database too to
let other possible application/services to retrieve the state.
What you are saying is I can either use Flink and forget database layer, or
make a java microservice with a database. Mixing Flink with a Database
doesn't make any sense.
My concerns with the database is how do you work out the previous state to
calculate the new one? Is it good and fast? (moving money from account A to
B isn't a problem cause you have two separate events).

Moreover, a second PoC I was considering is related to Flink CEP. Let's say
I am elaborating sensor data, I want to have a rule which is working on the
following principle:
- If the temperature is more than 40
- If the temperature yesterday at noon was more than 40
- If no one used vents yesterday and two days ago
then do something/emit some event.

This simple CEP example requires you to mine previous data/states from a
DB, right? Can Flink be considered for it without an external DB but only
relying on its internal RockDB ?

Hope I am not generating more confusion here.

Y

On Thu, Jul 5, 2018 at 11:21 AM, Fabian Hueske  wrote:

> Hi Yersinia,
>
> The main idea of an event-driven application is to hold the state (i.e.,
> the account data) in the streaming application and not in an external
> database like Couchbase.
> This design is very scalable (state is partitioned) and avoids look-ups
> from the external database because all state interactions are local.
> Basically, you are moving the database into the streaming application.
>
> There are a few things to consider if you maintain the state in the
> application:
> - You might need to mirror the state in an external database to make it
> queryable and available while the streaming application is down.
> - You need to have a good design to ensure that your consistency
> requirements are met in case of a failure (upsert writes can temporarily
> reset the external state).
> - The design becomes much more challenging if you need to access the state
> of two accounts to perform a transaction (subtract from the first and add
> to the second account) because Flink state is distributed per key and does
> not support remote lookups.
>
> If you do not want to store the state in the Flink application, I agree
> with Jörn that there's no need for Flink.
>
> Hope this helps,
> Fabian
>
> 2018-07-05 10:45 GMT+02:00 Yersinia Ruckeri :
>
>> My idea started from here: https://flink.apache.org/usecases.html
>> First use case describes what I am trying to realise (
>> https://flink.apache.org/img/usecases-eventdrivenapps.png)
>> My application is Flink, listening to incoming events, changing the state
>> of an object (really an aggregate here) and pushing out another event.
>> States can be persisted asynchronously; this is ok.
>>
>> My point on top to this picture is that the "state" it's not just
>> persisting something, but retrieving the current state, manipulate it new
>> information and persist the updated state.
>>
>>
>> Y
>>
>> On Wed, Jul 4, 2018 at 10:16 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Looks interesting.
>>>
>>> As I understand you have a microservice based on ingestion where a topic
>>> is defined for streaming messages that include transactional data. These
>>> transactions should already exist in your DB. For now we look at DB as part
>>> of your microservices and we take a logical view of it.
>>>
>>> So
>>>
>>>
>>>1. First microservice M1 provides ingestion of kafka yopic
>>>2. Second microservice M2 deploys Flink or Spark Streaming to
>>>manipulate the incoming messages. We can look at this later.
>>>3. Third microservice M3 consist of the database that provides
>>>current records for accounts identified by the account number in your
>>>message queue
>>>4. M3 will have to validate the incoming account number, update the
>>>transaction and provide completion handshake. Effectively you are 
>>> providing
>>>DPaaS
>>>
>>>
>>> So far we have avoided interfaces among these services. But I gather M1
>>> and M2 are well established. Assuming that Couchbase is your choice of DB I
>>> believe it provides JDBC drivers of some sort. It does not have to be
>>> Couchbase. You can achieve the same with Hbase as well or MongoDB. Anyhow
>>> the only challenge I see here is the interface between your Flink
>>> application in M2 and M3
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> ar

Re: Facing issue in RichSinkFunction

2018-07-05 Thread Hequn Cheng
Hi Amol,

The implementation of the RichSinkFunction probably contains a field that
is not serializable. To avoid serializable exception, you can:
1. Marking the field as transient. This makes the serialization mechanism
skip the field.
2. If the field is part of the object's persistent state, the type of the
field must implement Serializable.

Furthermore, you can remove some fields to locate the problem fields.

Best, Hequn


On Thu, Jul 5, 2018 at 5:23 PM, Amol S - iProgrammer 
wrote:

> Hello folks,
>
> I am trying to write my streaming result into mongodb using
> RIchSinkFunction as below.
>
> gonogoCustomerApplicationStream.addSink(mongoSink)
>
> where mongoSink is Autowired i.e. injected object and it is giving me below
> error.
>
> The implementation of the RichSinkFunction is not serializable. The object
> probably contains or references non serializable fields.
>
> what is solution on this?
>
> ---
> *Amol Suryawanshi*
> Java Developer
> am...@iprogrammer.com
>
>
> *iProgrammer Solutions Pvt. Ltd.*
>
>
>
> *Office 103, 104, 1st Floor Pride Portal,Shivaji Housing Society,
> Bahiratwadi,Near Hotel JW Marriott, Off Senapati Bapat Road, Pune - 411016,
> MH, INDIA.**Phone: +91 9689077510 | Skype: amols_iprogrammer*
> www.iprogrammer.com 
> 
>


Re: Dynamic Rule Evaluation in Flink

2018-07-05 Thread Puneet Kinra
Hi Aarti

Flink doesn't support connecting multiple streams with heterogeneous schema
,you can try the below solution

a) If stream A is sending some events make the output of that as
String/JsonString.

b) If stream B is sending some events make the output of that as
String/JsonString.

c) Now Using union function you can connect all the streams & use FlatMap
or process function to
evaluate all these streams against your defined rules.

d) For Storing your aggregations and rules you can build your cache layer
and pass as a argument
to the constructor of that flatmap.








On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta  wrote:

> Hi,
>
> We are currently evaluating Flink to build a real time rule engine that
> looks at events in a stream and evaluates them against a set of rules.
>
> The rules are dynamically configured and can be of three types -
> 1. Simple Conditions - these require you to look inside a single event.
> Example, match rule if A happens.
> 2. Aggregations - these require you to aggregate multiple events. Example,
> match rule if more than five A's happen.
> 3. Complex patterns - these require you to look at multiple events and
> detect patterns. Example, match rule if A happens and then B happens.
>
> Since the rules are dynamically configured, we cannot use CEP.
>
> As an alternative, we are using connected streams and the CoFlatMap
> function to store the rules in shared state, and evaluate each incoming
> event against the stored rules.  Implementation is similar to what's
> outlined here
> 
> .
>
> My questions -
>
>1. Since the CoFlatMap function works on a single event, how do we
>evaluate rules that require aggregations across events. (Match rule if more
>than 5 A events happen)
>2. Since the CoFlatMap function works on a single event, how do we
>evaluate rules that require pattern detection across events (Match rule if
>A happens, followed by B).
>3. How do you dynamically define a window function.
>
>
> --Aarti
>
>
> --
> Aarti Gupta 
> Director, Engineering, Correlation
>
>
> aagu...@qualys.com
> T
>
>
> Qualys, Inc. – Blog  | Community
>  | Twitter 
>
>
> 
>



-- 
*Cheers *

*Puneet Kinra*

*Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
*

*e-mail :puneet.ki...@customercentria.com
*


Batch Processing

2018-07-05 Thread Gaurav Sehgal
Hello,
   I am looking for  batch processing framework which will read data in
batches from MongoDb and enrich it using another data source and then
upload them in ElasticSearch, is Flink a good framework for such a use case.

Regards,
Gaurav


Release Notes - Flink 1.6

2018-07-05 Thread Puneet Kinra
Hi

Not able to see 1.6 flink release notes.


-- 
*Cheers *

*Puneet Kinra*

*Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
*

*e-mail :puneet.ki...@customercentria.com
*


Re: Release Notes - Flink 1.6

2018-07-05 Thread Chesnay Schepler
That's because 1.6 isn't released yet. It's scheduled to be released at 
the end of July.


The latest stable version 1.5.0.

On 05.07.2018 14:49, Puneet Kinra wrote:

Hi

Not able to see 1.6 flink release notes.


--
*Cheers *
*
*
*Puneet Kinra*
*
*

*Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com 
*


*e-mail :puneet.ki...@customercentria.com 
*







Re: A use-case for Flink and reactive systems

2018-07-05 Thread Mich Talebzadeh
Hi Fabian,

On your point below

… Basically, you are moving the database into the streaming application.

This assumes a finite size for the data in the streaming application to
persist. In terms of capacity planning how this works?

Some applications like Fraud try to address this by deploying databases
like Aerospike  that the Keys are kept in the
memory and indexed and the values are stored on SSD devices. I was
wondering how Flink in general can address this?


Regards,


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 5 Jul 2018 at 11:22, Fabian Hueske  wrote:

> Hi Yersinia,
>
> The main idea of an event-driven application is to hold the state (i.e.,
> the account data) in the streaming application and not in an external
> database like Couchbase.
> This design is very scalable (state is partitioned) and avoids look-ups
> from the external database because all state interactions are local.
> Basically, you are moving the database into the streaming application.
>
> There are a few things to consider if you maintain the state in the
> application:
> - You might need to mirror the state in an external database to make it
> queryable and available while the streaming application is down.
> - You need to have a good design to ensure that your consistency
> requirements are met in case of a failure (upsert writes can temporarily
> reset the external state).
> - The design becomes much more challenging if you need to access the state
> of two accounts to perform a transaction (subtract from the first and add
> to the second account) because Flink state is distributed per key and does
> not support remote lookups.
>
> If you do not want to store the state in the Flink application, I agree
> with Jörn that there's no need for Flink.
>
> Hope this helps,
> Fabian
>
> 2018-07-05 10:45 GMT+02:00 Yersinia Ruckeri :
>
>> My idea started from here: https://flink.apache.org/usecases.html
>> First use case describes what I am trying to realise (
>> https://flink.apache.org/img/usecases-eventdrivenapps.png)
>> My application is Flink, listening to incoming events, changing the state
>> of an object (really an aggregate here) and pushing out another event.
>> States can be persisted asynchronously; this is ok.
>>
>> My point on top to this picture is that the "state" it's not just
>> persisting something, but retrieving the current state, manipulate it new
>> information and persist the updated state.
>>
>>
>> Y
>>
>> On Wed, Jul 4, 2018 at 10:16 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Looks interesting.
>>>
>>> As I understand you have a microservice based on ingestion where a topic
>>> is defined for streaming messages that include transactional data. These
>>> transactions should already exist in your DB. For now we look at DB as part
>>> of your microservices and we take a logical view of it.
>>>
>>> So
>>>
>>>
>>>1. First microservice M1 provides ingestion of kafka yopic
>>>2. Second microservice M2 deploys Flink or Spark Streaming to
>>>manipulate the incoming messages. We can look at this later.
>>>3. Third microservice M3 consist of the database that provides
>>>current records for accounts identified by the account number in your
>>>message queue
>>>4. M3 will have to validate the incoming account number, update the
>>>transaction and provide completion handshake. Effectively you are 
>>> providing
>>>DPaaS
>>>
>>>
>>> So far we have avoided interfaces among these services. But I gather M1
>>> and M2 are well established. Assuming that Couchbase is your choice of DB I
>>> believe it provides JDBC drivers of some sort. It does not have to be
>>> Couchbase. You can achieve the same with Hbase as well or MongoDB. Anyhow
>>> the only challenge I see here is the interface between your Flink
>>> application in M2 and M3
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary da

Re: Dynamic Rule Evaluation in Flink

2018-07-05 Thread Fabian Hueske
Hi,

> Flink doesn't support connecting multiple streams with heterogeneous
schema

This is not correct.
Flink is very well able to connect streams with different schema. However,
you cannot union two streams with different schema.
In order to reconfigure an operator with changing rules, you can use
BroadcastProcessFunction or KeyedBroadcastProcessFunction [1].

In order to dynamically reconfigure aggregations and windowing, you would
need to implement the processing logic yourself in the process function
using state and timers.
There is no built-in support to reconfigure such operators.

Best,
Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/broadcast_state.html


2018-07-05 14:41 GMT+02:00 Puneet Kinra :

> Hi Aarti
>
> Flink doesn't support connecting multiple streams with heterogeneous
> schema ,you can try the below solution
>
> a) If stream A is sending some events make the output of that as
> String/JsonString.
>
> b) If stream B is sending some events make the output of that as
> String/JsonString.
>
> c) Now Using union function you can connect all the streams & use FlatMap
> or process function to
> evaluate all these streams against your defined rules.
>
> d) For Storing your aggregations and rules you can build your cache layer
> and pass as a argument
> to the constructor of that flatmap.
>
>
>
>
>
>
>
>
> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta  wrote:
>
>> Hi,
>>
>> We are currently evaluating Flink to build a real time rule engine that
>> looks at events in a stream and evaluates them against a set of rules.
>>
>> The rules are dynamically configured and can be of three types -
>> 1. Simple Conditions - these require you to look inside a single event.
>> Example, match rule if A happens.
>> 2. Aggregations - these require you to aggregate multiple events.
>> Example, match rule if more than five A's happen.
>> 3. Complex patterns - these require you to look at multiple events and
>> detect patterns. Example, match rule if A happens and then B happens.
>>
>> Since the rules are dynamically configured, we cannot use CEP.
>>
>> As an alternative, we are using connected streams and the CoFlatMap
>> function to store the rules in shared state, and evaluate each incoming
>> event against the stored rules.  Implementation is similar to what's
>> outlined here
>> 
>> .
>>
>> My questions -
>>
>>1. Since the CoFlatMap function works on a single event, how do we
>>evaluate rules that require aggregations across events. (Match rule if 
>> more
>>than 5 A events happen)
>>2. Since the CoFlatMap function works on a single event, how do we
>>evaluate rules that require pattern detection across events (Match rule if
>>A happens, followed by B).
>>3. How do you dynamically define a window function.
>>
>>
>> --Aarti
>>
>>
>> --
>> Aarti Gupta 
>> Director, Engineering, Correlation
>>
>>
>> aagu...@qualys.com
>> T
>>
>>
>> Qualys, Inc. – Blog  | Community
>>  | Twitter 
>>
>>
>> 
>>
>
>
>
> --
> *Cheers *
>
> *Puneet Kinra*
>
> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
> *
>
> *e-mail :puneet.ki...@customercentria.com
> *
>
>
>


Re: A use-case for Flink and reactive systems

2018-07-05 Thread Mich Talebzadeh
Hi,

What you are saying is I can either use Flink and forget database layer, or
make a java microservice with a database. Mixing Flink with a Database
doesn't make any sense.

I would have thought that moving with microservices concept, Flink handling
streaming data from the upstream microservice is an independent entity and
microservice on its own assuming loosely coupled terminologies here

Database tier is another microservice that interacts with your Flink and
can serve other consumers. Indeed in classic CEP like Aleri or StreamBase
you may persist your data to an external database for analytice.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 5 Jul 2018 at 12:26, Yersinia Ruckeri 
wrote:

> It helps, at least it's fairly clear now.
> I am not against storing the state into Flink, but as per your first
> point, I need to get it persisted, asynchronously, in an external database
> too to let other possible application/services to retrieve the state.
> What you are saying is I can either use Flink and forget database layer,
> or make a java microservice with a database. Mixing Flink with a Database
> doesn't make any sense.
> My concerns with the database is how do you work out the previous state to
> calculate the new one? Is it good and fast? (moving money from account A to
> B isn't a problem cause you have two separate events).
>
> Moreover, a second PoC I was considering is related to Flink CEP. Let's
> say I am elaborating sensor data, I want to have a rule which is working on
> the following principle:
> - If the temperature is more than 40
> - If the temperature yesterday at noon was more than 40
> - If no one used vents yesterday and two days ago
> then do something/emit some event.
>
> This simple CEP example requires you to mine previous data/states from a
> DB, right? Can Flink be considered for it without an external DB but only
> relying on its internal RockDB ?
>
> Hope I am not generating more confusion here.
>
> Y
>
> On Thu, Jul 5, 2018 at 11:21 AM, Fabian Hueske  wrote:
>
>> Hi Yersinia,
>>
>> The main idea of an event-driven application is to hold the state (i.e.,
>> the account data) in the streaming application and not in an external
>> database like Couchbase.
>> This design is very scalable (state is partitioned) and avoids look-ups
>> from the external database because all state interactions are local.
>> Basically, you are moving the database into the streaming application.
>>
>> There are a few things to consider if you maintain the state in the
>> application:
>> - You might need to mirror the state in an external database to make it
>> queryable and available while the streaming application is down.
>> - You need to have a good design to ensure that your consistency
>> requirements are met in case of a failure (upsert writes can temporarily
>> reset the external state).
>> - The design becomes much more challenging if you need to access the
>> state of two accounts to perform a transaction (subtract from the first and
>> add to the second account) because Flink state is distributed per key and
>> does not support remote lookups.
>>
>> If you do not want to store the state in the Flink application, I agree
>> with Jörn that there's no need for Flink.
>>
>> Hope this helps,
>> Fabian
>>
>> 2018-07-05 10:45 GMT+02:00 Yersinia Ruckeri :
>>
>>> My idea started from here: https://flink.apache.org/usecases.html
>>> First use case describes what I am trying to realise (
>>> https://flink.apache.org/img/usecases-eventdrivenapps.png)
>>> My application is Flink, listening to incoming events, changing the
>>> state of an object (really an aggregate here) and pushing out another event.
>>> States can be persisted asynchronously; this is ok.
>>>
>>> My point on top to this picture is that the "state" it's not just
>>> persisting something, but retrieving the current state, manipulate it new
>>> information and persist the updated state.
>>>
>>>
>>> Y
>>>
>>> On Wed, Jul 4, 2018 at 10:16 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Looks interesting.

 As I understand you have a microservice based on ingestion where a
 topic is defined for streaming messages that include transactional data.
 These transactions should already exist in your DB. For now we look at DB
 as part of your microservices and we take a logical view of it.

 So


1. First microservice M1 provides ingestion of kafka yopic

Re: Dynamic Rule Evaluation in Flink

2018-07-05 Thread Aarti Gupta
+Ken.

--Aarti

On Thu, Jul 5, 2018 at 6:48 PM, Aarti Gupta  wrote:

> Thanks everyone, will take a look.
>
> --Aarti
>
> On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske  wrote:
>
>> Hi,
>>
>> > Flink doesn't support connecting multiple streams with heterogeneous
>> schema
>>
>> This is not correct.
>> Flink is very well able to connect streams with different schema.
>> However, you cannot union two streams with different schema.
>> In order to reconfigure an operator with changing rules, you can use
>> BroadcastProcessFunction or KeyedBroadcastProcessFunction [1].
>>
>> In order to dynamically reconfigure aggregations and windowing, you would
>> need to implement the processing logic yourself in the process function
>> using state and timers.
>> There is no built-in support to reconfigure such operators.
>>
>> Best,
>> Fabian
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/
>> dev/stream/state/broadcast_state.html
>>
>>
>> 2018-07-05 14:41 GMT+02:00 Puneet Kinra > >:
>>
>>> Hi Aarti
>>>
>>> Flink doesn't support connecting multiple streams with heterogeneous
>>> schema ,you can try the below solution
>>>
>>> a) If stream A is sending some events make the output of that as
>>> String/JsonString.
>>>
>>> b) If stream B is sending some events make the output of that as
>>> String/JsonString.
>>>
>>> c) Now Using union function you can connect all the streams & use
>>> FlatMap or process function to
>>> evaluate all these streams against your defined rules.
>>>
>>> d) For Storing your aggregations and rules you can build your cache
>>> layer and pass as a argument
>>> to the constructor of that flatmap.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta  wrote:
>>>
 Hi,

 We are currently evaluating Flink to build a real time rule engine that
 looks at events in a stream and evaluates them against a set of rules.

 The rules are dynamically configured and can be of three types -
 1. Simple Conditions - these require you to look inside a single event.
 Example, match rule if A happens.
 2. Aggregations - these require you to aggregate multiple events.
 Example, match rule if more than five A's happen.
 3. Complex patterns - these require you to look at multiple events and
 detect patterns. Example, match rule if A happens and then B happens.

 Since the rules are dynamically configured, we cannot use CEP.

 As an alternative, we are using connected streams and the CoFlatMap
 function to store the rules in shared state, and evaluate each incoming
 event against the stored rules.  Implementation is similar to what's
 outlined here
 
 .

 My questions -

1. Since the CoFlatMap function works on a single event, how do we
evaluate rules that require aggregations across events. (Match rule if 
 more
than 5 A events happen)
2. Since the CoFlatMap function works on a single event, how do we
evaluate rules that require pattern detection across events (Match rule 
 if
A happens, followed by B).
3. How do you dynamically define a window function.


 --Aarti


 --
 Aarti Gupta 
 Director, Engineering, Correlation


 aagu...@qualys.com
 T


 Qualys, Inc. – Blog  | Community
  | Twitter 


 

>>>
>>>
>>>
>>> --
>>> *Cheers *
>>>
>>> *Puneet Kinra*
>>>
>>> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
>>> *
>>>
>>> *e-mail :puneet.ki...@customercentria.com
>>> *
>>>
>>>
>>>
>>
>
>
> --
> Aarti Gupta 
> Director, Engineering, Correlation
>
>
> aagu...@qualys.com
> T
>
>
> Qualys, Inc. – Blog  | Community
>  | Twitter 
>
>
> 
>



-- 
Aarti Gupta 
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog  | Community
 | Twitter 





Re: Dynamic Rule Evaluation in Flink

2018-07-05 Thread Aarti Gupta
Thanks everyone, will take a look.

--Aarti

On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske  wrote:

> Hi,
>
> > Flink doesn't support connecting multiple streams with heterogeneous
> schema
>
> This is not correct.
> Flink is very well able to connect streams with different schema. However,
> you cannot union two streams with different schema.
> In order to reconfigure an operator with changing rules, you can use
> BroadcastProcessFunction or KeyedBroadcastProcessFunction [1].
>
> In order to dynamically reconfigure aggregations and windowing, you would
> need to implement the processing logic yourself in the process function
> using state and timers.
> There is no built-in support to reconfigure such operators.
>
> Best,
> Fabian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.5/dev/stream/state/broadcast_state.html
>
>
> 2018-07-05 14:41 GMT+02:00 Puneet Kinra 
> :
>
>> Hi Aarti
>>
>> Flink doesn't support connecting multiple streams with heterogeneous
>> schema ,you can try the below solution
>>
>> a) If stream A is sending some events make the output of that as
>> String/JsonString.
>>
>> b) If stream B is sending some events make the output of that as
>> String/JsonString.
>>
>> c) Now Using union function you can connect all the streams & use
>> FlatMap or process function to
>> evaluate all these streams against your defined rules.
>>
>> d) For Storing your aggregations and rules you can build your cache layer
>> and pass as a argument
>> to the constructor of that flatmap.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta  wrote:
>>
>>> Hi,
>>>
>>> We are currently evaluating Flink to build a real time rule engine that
>>> looks at events in a stream and evaluates them against a set of rules.
>>>
>>> The rules are dynamically configured and can be of three types -
>>> 1. Simple Conditions - these require you to look inside a single event.
>>> Example, match rule if A happens.
>>> 2. Aggregations - these require you to aggregate multiple events.
>>> Example, match rule if more than five A's happen.
>>> 3. Complex patterns - these require you to look at multiple events and
>>> detect patterns. Example, match rule if A happens and then B happens.
>>>
>>> Since the rules are dynamically configured, we cannot use CEP.
>>>
>>> As an alternative, we are using connected streams and the CoFlatMap
>>> function to store the rules in shared state, and evaluate each incoming
>>> event against the stored rules.  Implementation is similar to what's
>>> outlined here
>>> 
>>> .
>>>
>>> My questions -
>>>
>>>1. Since the CoFlatMap function works on a single event, how do we
>>>evaluate rules that require aggregations across events. (Match rule if 
>>> more
>>>than 5 A events happen)
>>>2. Since the CoFlatMap function works on a single event, how do we
>>>evaluate rules that require pattern detection across events (Match rule 
>>> if
>>>A happens, followed by B).
>>>3. How do you dynamically define a window function.
>>>
>>>
>>> --Aarti
>>>
>>>
>>> --
>>> Aarti Gupta 
>>> Director, Engineering, Correlation
>>>
>>>
>>> aagu...@qualys.com
>>> T
>>>
>>>
>>> Qualys, Inc. – Blog  | Community
>>>  | Twitter 
>>>
>>>
>>> 
>>>
>>
>>
>>
>> --
>> *Cheers *
>>
>> *Puneet Kinra*
>>
>> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
>> *
>>
>> *e-mail :puneet.ki...@customercentria.com
>> *
>>
>>
>>
>


-- 
Aarti Gupta 
Director, Engineering, Correlation


aagu...@qualys.com
T


Qualys, Inc. – Blog  | Community
 | Twitter 





1.5.1

2018-07-05 Thread Vishal Santoshi
We are planning to go to 1.5.0 next week ( from 1.4.0 ). I could not get a
list of jira issues that would be addressed in the 1.5.1 release which
seems imminent. Could anyone inform the forum of the 1.5.1 bug fixes and
tell us a time line for the 1.5.1 release ?

Thanks much


Limiting in flight data

2018-07-05 Thread Vishal Santoshi
"Yes, Flink 1.5.0 will come with better tools to handle this problem.
Namely you will be able to limit the “in flight” data, by controlling the
number of assigned credits per channel/input gate. Even without any
configuring Flink 1.5.0 will out of the box buffer less data, thus
mitigating the problem."

I read this in another email chain. The docs ( may be you can point me to
them ) are not very clear on how to do the above. Any pointers will be
appreciated.

Thanks much.


Re: 1.5.1

2018-07-05 Thread Chesnay Schepler
Release notes: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12343053


I'm currently building the release artifacts, if everything goes 
smoothly it should be released next week.


On 05.07.2018 16:16, Vishal Santoshi wrote:
We are planning to go to 1.5.0 next week ( from 1.4.0 ). I could not 
get a list of jira issues that would be addressed in the 1.5.1 release 
which seems imminent. Could anyone inform the forum of the 1.5.1 bug 
fixes and tell us a time line for the 1.5.1 release ?


Thanks much





Re: Description of Flink event time processing

2018-07-05 Thread Fabian Hueske
Hi Elias,

Thanks for the great document!
I made a pass over it and left a few comments.

I think we should definitely add this to the documentation.

Thanks,
Fabian

2018-07-04 10:30 GMT+02:00 Fabian Hueske :

> Hi Elias,
>
> I agree, the docs lack a coherent discussion of event time features.
> Thank you for this write up!
> I just skimmed your document and will provide more detailed feedback later.
>
> It would be great to add such a page to the documentation.
>
> Best, Fabian
>
> 2018-07-03 3:07 GMT+02:00 Elias Levy :
>
>> The documentation of how Flink handles event time and watermarks is
>> spread across several places.  I've been wanting a single location that
>> summarizes the subject, and as none was available, I wrote one up.
>>
>> You can find it here: https://docs.google.com/
>> document/d/1b5d-hTdJQsPH3YD0zTB4ZqodinZVHFomKvt41FfUPMc/edit?usp=sharing
>>
>> I'd appreciate feedback, particularly about the correctness of the
>> described behavior.
>>
>
>


Re: Flink Kafka TimeoutException

2018-07-05 Thread Ted Yu
Have you tried increasing the request.timeout.ms parameter (Kafka) ?

Which Flink / Kafka release are you using ?

Cheers

On Thu, Jul 5, 2018 at 5:39 AM Amol S - iProgrammer 
wrote:

> Hello,
>
> I am using flink with kafka and getting below exception.
>
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for
> helloworld.t-7: 30525 ms has passed since last append
>
> ---
> *Amol Suryawanshi*
> Java Developer
> am...@iprogrammer.com
>
>
> *iProgrammer Solutions Pvt. Ltd.*
>
>
>
> *Office 103, 104, 1st Floor Pride Portal,Shivaji Housing Society,
> Bahiratwadi,Near Hotel JW Marriott, Off Senapati Bapat Road, Pune - 411016,
> MH, INDIA.**Phone: +91 9689077510 | Skype: amols_iprogrammer*
> www.iprogrammer.com 
> 
>


Re: A use-case for Flink and reactive systems

2018-07-05 Thread Yersinia Ruckeri
I guess my initial bad explanation caused confusion.
After reading again docs I got your points. I can use Flink for online
streaming processing, letting it to manage the state, which can be
persisted in a DB asynchronously to ensure savepoints and using queryable
state to make the current state available for queries (I know this API can
change, but let's assume it's ok for now).
DB size is curious, what about if I plan to use it in a true big
environment? We used a financial system, so if I am BNP Paribas, HSBC or
VISA and I want to process incoming transactions to update the balance.
Flink is into receiving transactions and updating the balance (state).
However, I have 100 millions of accounts, so even scaling Flink I might
have some storage limit.

My research around Flink was for investigating two cases:
1. see if and how Flink can be put into an event-sourcing based application
using CQRS to process an ongoing flow of events without specifically coding
an application (which might be a microservice) that make small upsert into
a DB (keep the case of a constant flow of transactions which determine a
balance update in a dedicated service)
2. Using CEP to trigger specific events based on a behaviour you have been
following. Take the case of sensors I described or supermarket points
systems: I want to give 1k points to all customers who bought tuna during
last 3 months and spent more than 100 euro per weeks and installed
supermarket mobile app since the beginning of the year. I want to do it
online processing the flow, rather than triggering an offline routine which
mines your behaviour.

I got stuck understanding that Flink don't work with the DB, except for
RockDB which it can be implemented internally. Hence, unless I redefine my
cases (which I aim to do), Flink isn't the best choice here.

Y

On Thu, Jul 5, 2018 at 2:09 PM, Mich Talebzadeh 
wrote:

> Hi,
>
> What you are saying is I can either use Flink and forget database layer,
> or make a java microservice with a database. Mixing Flink with a Database
> doesn't make any sense.
>
> I would have thought that moving with microservices concept, Flink
> handling streaming data from the upstream microservice is an independent
> entity and microservice on its own assuming loosely coupled terminologies
> here
>
> Database tier is another microservice that interacts with your Flink and
> can serve other consumers. Indeed in classic CEP like Aleri or StreamBase
> you may persist your data to an external database for analytice.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 5 Jul 2018 at 12:26, Yersinia Ruckeri 
> wrote:
>
>> It helps, at least it's fairly clear now.
>> I am not against storing the state into Flink, but as per your first
>> point, I need to get it persisted, asynchronously, in an external database
>> too to let other possible application/services to retrieve the state.
>> What you are saying is I can either use Flink and forget database layer,
>> or make a java microservice with a database. Mixing Flink with a Database
>> doesn't make any sense.
>> My concerns with the database is how do you work out the previous state
>> to calculate the new one? Is it good and fast? (moving money from account A
>> to B isn't a problem cause you have two separate events).
>>
>> Moreover, a second PoC I was considering is related to Flink CEP. Let's
>> say I am elaborating sensor data, I want to have a rule which is working on
>> the following principle:
>> - If the temperature is more than 40
>> - If the temperature yesterday at noon was more than 40
>> - If no one used vents yesterday and two days ago
>> then do something/emit some event.
>>
>> This simple CEP example requires you to mine previous data/states from a
>> DB, right? Can Flink be considered for it without an external DB but only
>> relying on its internal RockDB ?
>>
>> Hope I am not generating more confusion here.
>>
>> Y
>>
>> On Thu, Jul 5, 2018 at 11:21 AM, Fabian Hueske  wrote:
>>
>>> Hi Yersinia,
>>>
>>> The main idea of an event-driven application is to hold the state (i.e.,
>>> the account data) in the streaming application and not in an external
>>> database like Couchbase.
>>> This design is very scalable (state is partitioned) and avoids look-ups
>>> from the external database because all state interactions are local.
>>> Basically, you are moving the database into the streaming application.
>>>
>>> There are a few t

Re: Flink Kafka TimeoutException

2018-07-05 Thread ashish pok
Our experience on this has been that if Kafka cluster is healthy, JVM resource 
contentions on our Flink app caused by high heap utilization and there by lost 
CPU cycles on GC also did result in this issue. Getting basic JVM metrics like 
CPU load, GC times and Heap Util from your app (we use Graphite reporter) might 
 help point out if you have same issue.


- Ashish

On Thursday, July 5, 2018, 11:46 AM, Ted Yu  wrote:

Have you tried increasing the request.timeout.ms parameter (Kafka) ?
Which Flink / Kafka release are you using ?
Cheers
On Thu, Jul 5, 2018 at 5:39 AM Amol S - iProgrammer  
wrote:

Hello,

I am using flink with kafka and getting below exception.

org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for
helloworld.t-7: 30525 ms has passed since last append

---
*Amol Suryawanshi*
Java Developer
am...@iprogrammer.com


*iProgrammer Solutions Pvt. Ltd.*



*Office 103, 104, 1st Floor Pride Portal,Shivaji Housing Society,
Bahiratwadi,Near Hotel JW Marriott, Off Senapati Bapat Road, Pune - 411016,
MH, INDIA.**Phone: +91 9689077510 | Skype: amols_iprogrammer*
www.iprogrammer.com 







Re: Passing type information to JDBCAppendTableSink

2018-07-05 Thread Rong Rong
+1 to this answer.

MERGE is what I found most compatible syntax when dealing with upsert /
replace.

AFAIK, almost all DBMS have some kind of dialect regrading upsert
functionality, so following the SQL standard might be your best solution
here.
And yes both the MERGE ingestion SQL and the execution logs are gonna be
more complex.

--
Rong

On Wed, Jul 4, 2018 at 1:15 AM Fabian Hueske  wrote:

> There is also the SQL:2003 MERGE statement that can be used to implement
> UPSERT logic.
> It is a bit verbose but supported by Derby [1].
>
> Best, Fabian
>
> [1] https://issues.apache.org/jira/browse/DERBY-3155
>
> 2018-07-04 10:10 GMT+02:00 Fabian Hueske :
>
>> Hi Chris,
>>
>> MySQL (and maybe other DBMS as well) offers special syntax for upserts.
>>
>> The answers to this SO question [1] recommend "INSERT INTO ... ON
>> DUPLICATE KEY UPDATE ..." or "REPLACE INTO ...".
>> However, AFAIK this syntax is not standardized and might vary from DBMS
>> to DBMS.
>>
>> Best, Fabian
>>
>> [1]
>> https://stackoverflow.com/questions/4205181/insert-into-a-mysql-table-or-update-if-exists
>>
>> 2018-07-03 12:14 GMT+02:00 Chris Ruegger :
>>
>>> Fabian, Rong:
>>> Thanks for the help, greatly appreciated.
>>>
>>> I am currently using a Derby database for the append-only JDBC sink.
>>> So far I don't see a way to use a JDBC/relational database solution for
>>> a retract/upsert use case?
>>> Is it possible to set up JDBC sink with Derby or MySQL so that it goes
>>> back and updates or deletes/inserts previous rows and inserts new ones?
>>> I have not been able to find example source code that does that.
>>> Thanks again,
>>> Chris
>>>
>>>
>>> On Tue, Jul 3, 2018 at 5:24 AM, Fabian Hueske  wrote:
>>>
 Hi,

 In addition to what Rong said:

 - The types look OK.
 - You can also use Types.STRING, and Types.LONG instead of
 BasicTypeInfo.xxx
 - Beware that in the failure case, you might have multiple entries in
 the database table. Some databases support an upsert syntax which (together
 with key or uniqueness constraints) can ensure that each result is added
 just once, even if the query recovers from a failure.

 Best, Fabian

 2018-07-01 17:25 GMT+02:00 Rong Rong :

> Hi Chris,
>
> Looking at the code, seems like JDBCTypeUtil [1] is used for
> converting Flink TypeInformation into JDBC Type (Java.sql.type), and
> SQL_TIMESTAMP and SQL_TIME are both listed in the conversion mapping.
> However the JDBC types are different.
>
> Regarding the question whether your insert is correctly configured. It
> directly relates to how your DB executes the JDBC insert command.
> 1. Regarding type settings: Looking at the JDBCOutputFormat [2], seems
> like you can even execute your command without type array or type mapping
> cannot be found, in this case the PrepareStatement will be written with
> plain Object type. I tired it on MySQL and it actually works pretty well.
> 2. Another question is whether your underlying DB can handle "implicit
> type cast": For example, inserting an INTEGER type into a BIGINT column.
> AFAIK JDBCAppendableSink does not check compatibilities before 
> writeRecord,
> so it might be a good idea to include some sanity check beforehand.
>
> Thanks,
> Rong
>
> [1]
> https://github.com/apache/flink/blob/master/flink-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCTypeUtil.java
> [2]
> https://github.com/apache/flink/blob/master/flink-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormat.java#L109
>
> On Sun, Jul 1, 2018 at 5:22 AM chrisr123 
> wrote:
>
>>
>> Full Source except for mapper and timestamp assigner.
>>
>> Sample Input Stream record:
>> 1530447316589,Mary,./home
>>
>>
>> What are the correct parameters to pass for data types in the
>> JDBCAppendTableSink?
>> Am I doing this correctly?
>>
>>
>> // Get Execution Environment
>> StreamExecutionEnvironment env =
>> StreamExecutionEnvironment.getExecutionEnvironment();
>>
>> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
>> StreamTableEnvironment tableEnvironment =
>> TableEnvironment.getTableEnvironment(env);
>>
>> // Get and Set execution parameters.
>> ParameterTool parms = ParameterTool.fromArgs(args);
>> env.getConfig().setGlobalJobParameters(parms);
>>
>> // Configure Checkpoint and Restart
>> // configureCheckpoint(env);
>> // configureRestart(env);
>>
>> // Get Our Data Stream
>> DataStream> eventStream
>> = env
>> .socketTextStream(parms.get("host"),
>> parms.getInt("port"

Re: Slide Window Compute Optimization

2018-07-05 Thread Rong Rong
Hi Yennie,

AFAIK, the sliding window will in fact duplicate elements into multiple
different streams. There's a discussion thread regarding this [1].
We are looking into some performance improvement, can you provide some more
info regarding your use case?

--
Rong

[1] https://issues.apache.org/jira/browse/FLINK-7001

On Thu, Jul 5, 2018 at 3:30 AM Kostas Kloudas 
wrote:

> Hi,
>
> You are correct that with sliding windows you will have 3600 “open
> windows” at any point.
> Could you describe a bit more what you want to do?
>
> If you simply want to have an update of something like a counter every
> second, then you can
> implement your own logic with a ProcessFunction that allows to handle
> state and timers in a
> custom way (see [1]).
>
> Hope this helps,
> Kostas
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/process_function.html
>
>
> On Jul 5, 2018, at 12:12 PM, YennieChen88  wrote:
>
> Hi,
>I want to use slide windows of 1 hour window size and 1 second step
> size. I found that once a element arrives, it will be processed in 3600
> windows serially through one thread. It takes serveral seconds to finish
> one
> element processing,much more than my expection. Do I have any way to
> optimizate it?
>Thank you very much for your reply.
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>
>
>


Re: Description of Flink event time processing

2018-07-05 Thread Rong Rong
Hi Elias,

Thanks for putting together the document. This is actually a very good,
well-rounded document.
I think you did not to enable access for comments for the link. Would you
mind enabling comments for the google doc?

Thanks,
Rong


On Thu, Jul 5, 2018 at 8:39 AM Fabian Hueske  wrote:

> Hi Elias,
>
> Thanks for the great document!
> I made a pass over it and left a few comments.
>
> I think we should definitely add this to the documentation.
>
> Thanks,
> Fabian
>
> 2018-07-04 10:30 GMT+02:00 Fabian Hueske :
>
>> Hi Elias,
>>
>> I agree, the docs lack a coherent discussion of event time features.
>> Thank you for this write up!
>> I just skimmed your document and will provide more detailed feedback
>> later.
>>
>> It would be great to add such a page to the documentation.
>>
>> Best, Fabian
>>
>> 2018-07-03 3:07 GMT+02:00 Elias Levy :
>>
>>> The documentation of how Flink handles event time and watermarks is
>>> spread across several places.  I've been wanting a single location that
>>> summarizes the subject, and as none was available, I wrote one up.
>>>
>>> You can find it here:
>>> https://docs.google.com/document/d/1b5d-hTdJQsPH3YD0zTB4ZqodinZVHFomKvt41FfUPMc/edit?usp=sharing
>>>
>>> I'd appreciate feedback, particularly about the correctness of the
>>> described behavior.
>>>
>>
>>
>


回复:Limiting in flight data

2018-07-05 Thread Zhijiang(wangzhijiang999)
Hi Vishal,

Before Flink-1.5.0, the sender tries best to send data on the network until the 
wire is filled with data. From Flink-1.5.0 the network flow control is improved 
by credit-based idea. That means the sender transfers data based on how many 
buffers avaiable on receiver side, so there will be no data accumulated on the 
wire. From this point, the in-flighting data is less than before.

Also you can further limit the in-flighting data by controling the number of 
credits on receiver side, and the related parameters are 
taskmanager.network.memory.buffers-per-channel and 
taskmanager.network.memory.floating-buffers-per-gate. 

If you have other questions about them, let me know then i can explain for you.

Zhijiang
--
发件人:Vishal Santoshi 
发送时间:2018年7月5日(星期四) 22:28
收件人:user 
主 题:Limiting in flight data

"Yes, Flink 1.5.0 will come with better tools to handle this problem. Namely 
you will be able to limit the “in flight” data, by controlling the number of 
assigned credits per channel/input gate. Even without any configuring Flink 
1.5.0 will out of the box buffer less data, thus mitigating the problem."

I read this in another email chain. The docs ( may be you can point me to them 
) are not very clear on how to do the above. Any pointers will be appreciated.

Thanks much.



回复:Handling back pressure in Flink.

2018-07-05 Thread Zhijiang(wangzhijiang999)
Hi Mich,

From flink-1.5.0 the network flow control is improved by credit-based mechanism 
whichs handles backpressure better than before. The producer sends data based 
on the number of available buffers(credit) onconsumer side. If processing time 
on consumer side is slower than producing time on producer side, the data will 
be cached on outqueue and inqueue memories of both side which may trigger back 
pressure the producer side. You can increase the number of credits on consumer 
side to relieve back pressure. Eventhough the back pressure happens, the 
application is still stable (will not cause OOM). I think you should not worry 
about that. Normally it is better to consider TPS of both sides and set the 
proper paralellism to avoid back pressure to some extent.

Zhijiang
--
发件人:Mich Talebzadeh 
发送时间:2018年7月4日(星期三) 20:40
收件人:user 
主 题:Handling back pressure in Flink.

Hi,

In spark one can handle back pressure by setting the spark conf parameter:

sparkConf.set("spark.streaming.backpressure.enabled","true")

With backpressure you make Spark Streaming application stable, i.e. receives 
data only as fast as it can process it. In general one needs to ensure that 
your microbatching processing time is less that your batch interval, i.e the 
rate that your producer sends data into Kafka. For example this is shown in 
Spark GUI below for batch interval = 2 seconds



Is there such procedure in Flink please?

Thanks


Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk.Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 




image.png
Description: Binary data


Re: Slide Window Compute Optimization

2018-07-05 Thread YennieChen88
Hi Kostas and Rong,
Thank you for your reply.
As both of you ask for more info about my use case, I now reply in
unison.
My case is used for counting the number of successful login and failures
within one hour, keyBy other login related attributes (e.g. ip, device,
login type ...). According to the count result of the previous hour, to
judge whether the next login is compliant. 
We have a high requirement for the flink compute time, to reduce the
impact on user login. From receiving source to sinking results into
database, only about 10ms time is acceptable. Base on this, we expect the
compute result as accurate as possible. The best case without error is the
latest sink time after 1-hour compute exactly the same as the user login
time which need judge compliance. Is that means the smaller the step size of
slide window, the more accurate the results? But Now it seems that the
smaller step size of slide window,the longer time need to compute. Because
once a element arrives, it will be processed in every window (number of
windows = window size/step size)serially through one thread.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Slide Window Compute Optimization

2018-07-05 Thread YennieChen88
Hi Kostas and Rong,
Thank you for your reply.
As both of you ask for more info about my use case, I now reply in
unison.
My case is used for counting the number of successful login and failures
within one hour, keyBy other login related attributes (e.g. ip, device,
login type ...). According to the count result of the previous hour, to
judge whether the next login is compliant.
We have a high requirement for the flink compute time, to reduce the
impact on user login. From receiving source to sinking results into
database, only about 10ms time is acceptable. Base on this, we expect the
compute result as accurate as possible. The best case without error is the
latest sink time after 1-hour compute exactly the same as the user login
time which need judge compliance. Is that means the smaller the step size of
slide window, the more accurate the results? But Now it seems that the
smaller step size of slide window,the longer time need to compute. Because
once a element arrives, it will be processed in every window (number of
windows = window size/step size)serially through one thread.

Rong Rong wrote
> Hi Yennie,
> 
> AFAIK, the sliding window will in fact duplicate elements into multiple
> different streams. There's a discussion thread regarding this [1].
> We are looking into some performance improvement, can you provide some
> more
> info regarding your use case?
> 
> --
> Rong
> 
> [1] https://issues.apache.org/jira/browse/FLINK-7001
> 
> On Thu, Jul 5, 2018 at 3:30 AM Kostas Kloudas <

> k.kloudas@

> >
> wrote:
> 
>> Hi,
>>
>> You are correct that with sliding windows you will have 3600 “open
>> windows” at any point.
>> Could you describe a bit more what you want to do?
>>
>> If you simply want to have an update of something like a counter every
>> second, then you can
>> implement your own logic with a ProcessFunction that allows to handle
>> state and timers in a
>> custom way (see [1]).
>>
>> Hope this helps,
>> Kostas
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/operators/process_function.html
>>
>>
>> On Jul 5, 2018, at 12:12 PM, YennieChen88 <

> chenyanying3@

> > wrote:
>>
>> Hi,
>>I want to use slide windows of 1 hour window size and 1 second step
>> size. I found that once a element arrives, it will be processed in 3600
>> windows serially through one thread. It takes serveral seconds to finish
>> one
>> element processing,much more than my expection. Do I have any way to
>> optimizate it?
>>Thank you very much for your reply.
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>>
>>





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Dynamic Rule Evaluation in Flink

2018-07-05 Thread Puneet Kinra
Hi Fabian

I know you can connect 2 streams with heterogeneous schema using connect
function.
that has only one port or one parameter.
can you send more than one heterogeneous stream to connect.

On Thu, Jul 5, 2018 at 6:37 PM, Fabian Hueske  wrote:

> Hi,
>
> > Flink doesn't support connecting multiple streams with heterogeneous
> schema
>
> This is not correct.
> Flink is very well able to connect streams with different schema. However,
> you cannot union two streams with different schema.
> In order to reconfigure an operator with changing rules, you can use
> BroadcastProcessFunction or KeyedBroadcastProcessFunction [1].
>
> In order to dynamically reconfigure aggregations and windowing, you would
> need to implement the processing logic yourself in the process function
> using state and timers.
> There is no built-in support to reconfigure such operators.
>
> Best,
> Fabian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.5/dev/stream/state/broadcast_state.html
>
>
> 2018-07-05 14:41 GMT+02:00 Puneet Kinra 
> :
>
>> Hi Aarti
>>
>> Flink doesn't support connecting multiple streams with heterogeneous
>> schema ,you can try the below solution
>>
>> a) If stream A is sending some events make the output of that as
>> String/JsonString.
>>
>> b) If stream B is sending some events make the output of that as
>> String/JsonString.
>>
>> c) Now Using union function you can connect all the streams & use
>> FlatMap or process function to
>> evaluate all these streams against your defined rules.
>>
>> d) For Storing your aggregations and rules you can build your cache layer
>> and pass as a argument
>> to the constructor of that flatmap.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 2, 2018 at 2:38 PM, Aarti Gupta  wrote:
>>
>>> Hi,
>>>
>>> We are currently evaluating Flink to build a real time rule engine that
>>> looks at events in a stream and evaluates them against a set of rules.
>>>
>>> The rules are dynamically configured and can be of three types -
>>> 1. Simple Conditions - these require you to look inside a single event.
>>> Example, match rule if A happens.
>>> 2. Aggregations - these require you to aggregate multiple events.
>>> Example, match rule if more than five A's happen.
>>> 3. Complex patterns - these require you to look at multiple events and
>>> detect patterns. Example, match rule if A happens and then B happens.
>>>
>>> Since the rules are dynamically configured, we cannot use CEP.
>>>
>>> As an alternative, we are using connected streams and the CoFlatMap
>>> function to store the rules in shared state, and evaluate each incoming
>>> event against the stored rules.  Implementation is similar to what's
>>> outlined here
>>> 
>>> .
>>>
>>> My questions -
>>>
>>>1. Since the CoFlatMap function works on a single event, how do we
>>>evaluate rules that require aggregations across events. (Match rule if 
>>> more
>>>than 5 A events happen)
>>>2. Since the CoFlatMap function works on a single event, how do we
>>>evaluate rules that require pattern detection across events (Match rule 
>>> if
>>>A happens, followed by B).
>>>3. How do you dynamically define a window function.
>>>
>>>
>>> --Aarti
>>>
>>>
>>> --
>>> Aarti Gupta 
>>> Director, Engineering, Correlation
>>>
>>>
>>> aagu...@qualys.com
>>> T
>>>
>>>
>>> Qualys, Inc. – Blog  | Community
>>>  | Twitter 
>>>
>>>
>>> 
>>>
>>
>>
>>
>> --
>> *Cheers *
>>
>> *Puneet Kinra*
>>
>> *Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
>> *
>>
>> *e-mail :puneet.ki...@customercentria.com
>> *
>>
>>
>>
>


-- 
*Cheers *

*Puneet Kinra*

*Mobile:+918800167808 | Skype : puneet.ki...@customercentria.com
*

*e-mail :puneet.ki...@customercentria.com
*