How often is the product db updated? Based on that you can store product
metadata as state in Flink, maybe setup the state on cluster startup and then
update daily etc.
Also, just based on this feature, flink doesn’t seem to add a lot of value on
top of Kafka. As Jorn said below, you can very w
You could load the historical data as flink state and then look up the state
with the key derived from input record.
That should serve like a join in relational world.
You may also want to think about keeping the writes and querying isolated.
Especially if your windows are going to be long (eg ca
Thanks for the reply Robert – I will try out #1 & keep you posted.
From: Robert Metzger
Date: Wednesday, May 24, 2017 at 7:44 AM
To: "Jain, Ankit"
Cc: "user@flink.apache.org"
Subject: Re: High Availability on Yarn
Hi Ankit,
I realized I can answer your questions myse
Following up on this.
From: "Jain, Ankit"
Date: Tuesday, May 16, 2017 at 12:14 AM
To: Stephan Ewen , "user@flink.apache.org"
Subject: Re: High Availability on Yarn
Bringing it back to list’s focus.
From: "Jain, Ankit"
Date: Thursday, May 11, 2017 at 1:
Hi Flavio,
While you wait on an update from Kostas, wanted to understand the use-case
better and share my thoughts-
1) Why is current batch mode expensive? Where are you persisting the data
after updates? Way I see it by moving to Flink, you get to use RocksDB(a
key-value store) that mak
Bringing it back to list’s focus.
From: "Jain, Ankit"
Date: Thursday, May 11, 2017 at 1:19 PM
To: Stephan Ewen , "user@flink.apache.org"
Subject: Re: High Availability on Yarn
Got the answer on #2, looks like that will work, still looking for suggestions
on #1.
Thank
What if we copy the big data set to HDFS on start of cluster (eg EMR if using
AWS) and then use that to build distributed operatot state in Flink instead of
calling the external store?
How does flink contributors feel about that?
Thanks
Ankit
On 5/14/17, 8:17 PM, "yunfan123" wrote:
The 1
Also, I hope state & checkpointing writes to S3 happens async w/o impacting the
actual job execution graph?
If so, will there still be a performance impact from using S3?
Thanks
Ankit
From: Ayush Goyal
Date: Thursday, May 11, 2017 at 11:21 PM
To: Stephan Ewen , Till Rohrmann
Cc: user
Subject
Got the answer on #2, looks like that will work, still looking for suggestions
on #1.
Thanks
Ankit
From: "Jain, Ankit"
Date: Thursday, May 11, 2017 at 8:26 AM
To: Stephan Ewen , "user@flink.apache.org"
Subject: Re: High Availability on Yarn
Following up further on th
) Regarding elasticity, I know for now a running flink cluster can’t
make use of new hosts added to EMR but can I am guessing Yarn will still see
the new hosts and new flink jobs can make use it, is that right?
Thanks
Ankit
From: "Jain, Ankit"
Date: Monday, May 8, 2017 at
Aljoscha was
guessing below or only for leader election in case of accidently running
multiple Application Masters ?
Thanks
Ankit
From: Stephan Ewen
Date: Monday, May 8, 2017 at 9:00 AM
To: "user@flink.apache.org" , "Jain, Ankit"
Subject: Re: High Availability on Yarn
@
keeper instance for ALL of flink jobs?
Or
Recommendation is to have a dedicated Zookeeper instance per flink job?
Thanks
Ankit
From: Aljoscha Krettek
Date: Thursday, May 4, 2017 at 1:19 AM
To: "Jain, Ankit"
Cc: "user@flink.apache.org" , Till Rohrmann
Subject: Re: High Availabi
nkit
From: Aljoscha Krettek
Date: Wednesday, May 3, 2017 at 2:05 AM
To: "Jain, Ankit"
Cc: "user@flink.apache.org" , Till Rohrmann
Subject: Re: High Availability on Yarn
Hi,
As a first comment, the work mentioned in the FLIP-6 doc you linked is still
work-in-progre
Hi fellow users,
We are trying to straighten out high availability story for flink.
Our setup includes a long running EMR cluster, job submission is a two-step
process – 1) Flink cluster is first created using flink yarn client on the EMR
cluster already running 2) Flink job is submitted.
I als
could parse all my shapes in an
RTree before handling my points, without relying on static
Or any other way that would allow me to do something similar.
Thanks in advance for your insight.
Gwen’
From: Jain, Ankit [mailto:ankit.j...@here.com<mailto:ankit.j...@here.com>]
Sent: j
Hi Gwen,
I would recommend looking into a data structure called RTree that is designed
specifically for this use case, i.e matching point to a region.
Thanks
Ankit
From: Fabian Hueske
Date: Wednesday, February 22, 2017 at 2:41 PM
To:
Subject: Re: Cross operation on two huge datasets
Hi Gwen,
16 matches
Mail list logo