Re: Implement Joins with Lookup Data

2018-07-24 Thread Jain, Ankit
How often is the product db updated? Based on that you can store product metadata as state in Flink, maybe setup the state on cluster startup and then update daily etc. Also, just based on this feature, flink doesn’t seem to add a lot of value on top of Kafka. As Jorn said below, you can very w

Re: Appending Windowed Aggregates to Events

2017-06-26 Thread Jain, Ankit
You could load the historical data as flink state and then look up the state with the key derived from input record. That should serve like a join in relational world. You may also want to think about keeping the writes and querying isolated. Especially if your windows are going to be long (eg ca

Re: High Availability on Yarn

2017-05-24 Thread Jain, Ankit
Thanks for the reply Robert – I will try out #1 & keep you posted. From: Robert Metzger Date: Wednesday, May 24, 2017 at 7:44 AM To: "Jain, Ankit" Cc: "user@flink.apache.org" Subject: Re: High Availability on Yarn Hi Ankit, I realized I can answer your questions myse

Re: High Availability on Yarn

2017-05-23 Thread Jain, Ankit
Following up on this. From: "Jain, Ankit" Date: Tuesday, May 16, 2017 at 12:14 AM To: Stephan Ewen , "user@flink.apache.org" Subject: Re: High Availability on Yarn Bringing it back to list’s focus. From: "Jain, Ankit" Date: Thursday, May 11, 2017 at 1:

Re: Stateful streaming question

2017-05-16 Thread Jain, Ankit
Hi Flavio, While you wait on an update from Kostas, wanted to understand the use-case better and share my thoughts- 1) Why is current batch mode expensive? Where are you persisting the data after updates? Way I see it by moving to Flink, you get to use RocksDB(a key-value store) that mak

Re: High Availability on Yarn

2017-05-16 Thread Jain, Ankit
Bringing it back to list’s focus. From: "Jain, Ankit" Date: Thursday, May 11, 2017 at 1:19 PM To: Stephan Ewen , "user@flink.apache.org" Subject: Re: High Availability on Yarn Got the answer on #2, looks like that will work, still looking for suggestions on #1. Thank

Re: static/dynamic lookups in flink streaming

2017-05-15 Thread Jain, Ankit
What if we copy the big data set to HDFS on start of cluster (eg EMR if using AWS) and then use that to build distributed operatot state in Flink instead of calling the external store? How does flink contributors feel about that? Thanks Ankit On 5/14/17, 8:17 PM, "yunfan123" wrote: The 1

Re: Storage options for RocksDBStateBackend

2017-05-15 Thread Jain, Ankit
Also, I hope state & checkpointing writes to S3 happens async w/o impacting the actual job execution graph? If so, will there still be a performance impact from using S3? Thanks Ankit From: Ayush Goyal Date: Thursday, May 11, 2017 at 11:21 PM To: Stephan Ewen , Till Rohrmann Cc: user Subject

Re: High Availability on Yarn

2017-05-11 Thread Jain, Ankit
Got the answer on #2, looks like that will work, still looking for suggestions on #1. Thanks Ankit From: "Jain, Ankit" Date: Thursday, May 11, 2017 at 8:26 AM To: Stephan Ewen , "user@flink.apache.org" Subject: Re: High Availability on Yarn Following up further on th

Re: High Availability on Yarn

2017-05-11 Thread Jain, Ankit
) Regarding elasticity, I know for now a running flink cluster can’t make use of new hosts added to EMR but can I am guessing Yarn will still see the new hosts and new flink jobs can make use it, is that right? Thanks Ankit From: "Jain, Ankit" Date: Monday, May 8, 2017 at

Re: High Availability on Yarn

2017-05-08 Thread Jain, Ankit
Aljoscha was guessing below or only for leader election in case of accidently running multiple Application Masters ? Thanks Ankit From: Stephan Ewen Date: Monday, May 8, 2017 at 9:00 AM To: "user@flink.apache.org" , "Jain, Ankit" Subject: Re: High Availability on Yarn @

Re: High Availability on Yarn

2017-05-05 Thread Jain, Ankit
keeper instance for ALL of flink jobs? Or Recommendation is to have a dedicated Zookeeper instance per flink job? Thanks Ankit From: Aljoscha Krettek Date: Thursday, May 4, 2017 at 1:19 AM To: "Jain, Ankit" Cc: "user@flink.apache.org" , Till Rohrmann Subject: Re: High Availabi

Re: High Availability on Yarn

2017-05-03 Thread Jain, Ankit
nkit From: Aljoscha Krettek Date: Wednesday, May 3, 2017 at 2:05 AM To: "Jain, Ankit" Cc: "user@flink.apache.org" , Till Rohrmann Subject: Re: High Availability on Yarn Hi, As a first comment, the work mentioned in the FLIP-6 doc you linked is still work-in-progre

High Availability on Yarn

2017-05-01 Thread Jain, Ankit
Hi fellow users, We are trying to straighten out high availability story for flink. Our setup includes a long running EMR cluster, job submission is a two-step process – 1) Flink cluster is first created using flink yarn client on the EMR cluster already running 2) Flink job is submitted. I als

Re: Cross operation on two huge datasets

2017-03-02 Thread Jain, Ankit
could parse all my shapes in an RTree before handling my points, without relying on static Or any other way that would allow me to do something similar. Thanks in advance for your insight. Gwen’ From: Jain, Ankit [mailto:ankit.j...@here.com<mailto:ankit.j...@here.com>] Sent: j

Re: Cross operation on two huge datasets

2017-02-23 Thread Jain, Ankit
Hi Gwen, I would recommend looking into a data structure called RTree that is designed specifically for this use case, i.e matching point to a region. Thanks Ankit From: Fabian Hueske Date: Wednesday, February 22, 2017 at 2:41 PM To: Subject: Re: Cross operation on two huge datasets Hi Gwen,