wrote:
> Hello,
> i'm trying to setup a testing environment using:
>
> - Flink HA with Zookeeper
> - Docker Compose
>
> While starting the TaskManager generates an exception and then after some
> restarts if fails.
>
> The exception is:
> "Caused by: org.apache.f
Hello,
i'm trying to setup a testing environment using:
- Flink HA with Zookeeper
- Docker Compose
While starting the TaskManager generates an exception and then after some
restarts if fails.
The exception is:
"Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenExceptio
Hello,
We are migrating our HA setup from ZK to K8S, and we have a question
regarding the RPC port.
Previously with ZK, the RPC connection config was the :
high-availability.jobmanager.port
We were expecting that the config will be the same with K8S HA, as the doc
says : "The port (range) used b
Add some more information to Gyula's comment.
For application mode without checkpoint, you do not need to activate the HA
since it will not take any effect and the Flink job will be submitted again
after the JobManager restarted. Because the job submission happens on the
JobManager side.
For sess
Without HA, if the jobmanager goes down, job information is lost so the job
won’t be restarted after the JM comes back up.
Gyula
On Thu, 13 Oct 2022 at 19:07, marco andreas
wrote:
>
>
> Hello,
>
> Can someone explain to me what is the point of using HA when deploying an
> application cluster wi
Hello,
Can someone explain to me what is the point of using HA when deploying an
application cluster with a single JM and the checkpoints are not activated.
AFAK when the pod of the JM goes down kubernetes will restart it anyway so
we don't need to activate the HA in this case.
Maybe there's som
Could you please check that the allocated load balancer could be accessed
locally(on the Flink client side)?
Best,
Yang
Fabian Paul 于2021年7月29日周四 下午7:45写道:
> Hi Dhiru,
>
> Sorry for the late reply. Once the cluster is successfully started the web
> UI should be reachable if you somehow forward
Hi Dhiru,
Sorry for the late reply. Once the cluster is successfully started the web UI
should be reachable if you somehow forward the port of the running pod.
Although with the exception you have shared I suspect the cluster never fully
runs (or not long enough). Can you share the full stacktra
Hi Dhiru,
No worries I completely understand your point. Usually all the executable
scripts from Flink can be found in the main repository [1].
We also provide a community edition of our commercial product [2] which manages
the lifecycle of the cluster and you do not have to use these scripts an
Hi Dhirendra,
Thanks for reaching out. A good way to start is to have a look at [1] and [2].
Once you have everything setup it should be possible to delete the pod of the
JobManager while an application is running and the job successfully recovers.
You can use one of the example Flink applicati
hi , I am very new to flink , I am planning to install Flink HA setup on eks
cluster with 5 worker nodes . Please can some one point me to right materials
or direction how to install as well as any sample job which I can run only for
testing and confirm all things are working as expected
loadHandler.channelRead0(FileUploadHandler.java:159)
>> [flink-dist_2.11-1.11.2.jar:1.11.2]
>>
>> at
>> org.apache.flink.runtime.rest.FileUploadHandler.channelRead0(FileUploadHandler.java:68)
>> [flink-dist_2.11-1.11.2.jar:1.11.2]
>>
>> at
>> org.apach
ation-for-aws
Cheers,
Till
On Wed, Dec 2, 2020 at 11:31 AM sidhant gupta wrote:
> Hi All,
>
> I have 2 job managers in flink HA mode cluster setup. I have a load
> balancer forwarding request to both (leader and stand by) the job managers
> in default round-robin fashion. While uploa
Hi All,
I have 2 job managers in flink HA mode cluster setup. I have a load
balancer forwarding request to both (leader and stand by) the job managers
in default round-robin fashion. While uploading the job jar the Web UI is
fluctuating between the leader and standby page. Its difficult to upload
Thanks you both for answers.
So I just want to have this right.
I can I achieve HA for Job Cluster Docker config having the zookeeper quorum
configured like mentioned in [1] right (with s3 and zookeeper)?
I assume to modify default Job Cluster config to match the [1] setup.
[1]
https://ci.apach
Thanks you both for answers.
So I just want to have this right.
I can I achieve HA for Job Cluster Docker config having the zookeeper quorum
configured like mentioned in [1] right (with s3 and zookeeper)?
I assume to modify default Job Cluster config to match the [1] setup.
[1]
https://ci.apach
Just like tison has said, you could use a deployment to restart the
jobmanager pod. However,
if you want to make the all jobs could recover from the checkpoint, you
also need to use the
zookeeper and HDFS/S3 to store the high-availability data.
Also some Kubernetes native HA support is in plan[1].
Hi Krzysztof,
Flink doesn't provide JM HA itself yet.
For YARN deployment, you can rely on yarn.application-attempts
configuration[1];
for Kubernetes deployment, Flink uses Kubernetes deployment to restart a
failed JM.
Though, such standalone mode doesn't tolerate JM failure and strategies
above
Hi,
In [1] where we can find setup for Stand Alone an YARN clusters to achieve
Job Manager's HA.
Is Standalone Cluster High Availability with a zookeeper the same approach
for Docker's Job Cluster approach with Kubernetes?
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/jobma
e41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
>
> On Fri, Sep 20, 2019 at 10:57 PM Steven Nelson
> wrote:
>
>> Hello!
>>
>> I am having some difficulty with multiple job managers in an HA setup
>> using Flink 1.9.0.
>>
>> I have 2 job managers and
; high-availability.cluster-id: /imet-enhance
> high-availability.storageDir: hdfs:///flink/ha/
> high-availability.zookeeper.quorum:
> flink-state-hdfs-zookeeper-1.flink-state-hdfs-zookeeper-headless.default.svc.cluster.local:2181,flink-state-hdfs-zookeeper-2.flink-state-hdfs-zookeeper-headles
Hello!
I am having some difficulty with multiple job managers in an HA setup using
Flink 1.9.0.
I have 2 job managers and have setup the HA setup with the following config
high-availability: zookeeper
high-availability.cluster-id: /imet-enhance
high-availability.storageDir: hdfs:///flink/ha
>>
>> In that test, I set “yarn.application-attempts” to 5, but Flink cluster
>> was recovered more than 5 times.
>>
>>
>> Does anyone know what “yarn.application-attempts” mean, and when Flink
>> cluster’s attempts time will be incremented ?
>>
>
ut I still don’t get it.
>
>
>
> https://stackoverflow.com/questions/56225088/why-is-flink-ha-cluster-on-yarn-recovered-more-than-the-maximum-number-of-attemp
>
>
>
> Best,
> --
> Kazunori Shinhira
> Mail : k.shinhira.1...@gmail.com
>
tions/56225088/why-is-flink-ha-cluster-on-yarn-recovered-more-than-the-maximum-number-of-attemp
Best,
--
Kazunori Shinhira
Mail : k.shinhira.1...@gmail.com
-availability.cluster-id: /cluster1
high-availability.storageDir: /flink/ha/
high-availability.zookeeper.quorum:
flink-state-hdfs-zookeeper-1.flink-state-hdfs-zookeeper-headless.default.svc.cluster.local:2181,flink-state-hdfs-zookeeper-2.flink-state-hdfs-zookeeper-headless.default.svc.cluster.local:2181
Thanks for the info, I have managed to launch a HA cluster with adding
rpc.address for all job managers.
But it did not work with start-cluster.sh, I had to add one by one.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi,
It will use HA settings as long as you specify the high-availability:
zookeeper. The jobmanager.rpc.adress is used by the jobmanager as a binding
address. You can verify it by starting two jobmanagers and then killing the
leader.
Best,
Dawid
On Tue, 21 Aug 2018 at 17:46, mozer
wrote:
> Yeah,
Yeah, you are right. I have already tried to set up jobmanager.rpc.adress and
it works in that case, but if I use this setting I will not be able to use
HA, am i right ?
How the job manager can register to zookeeper with the right address but not
localhost ?
--
Sent from: http://apache-flink-
Hi,
In your case the jobmanager binds itself to localhost and that's what it
writes to zookeeper. Try starting the jobmanager manually with
jobmanager.rpc.address set to the ip of machine you are running the
jobmanager. In other words make sure the jobmanager binds itself to the
right ip.
Regards
FQD or full ip; tried all of them, still no changes ...
For ssh connection, I can connect to each machine without passwords.
Do you think that the problem can come from :
*high-availability.storageDir: file:///shareflink/recovery* ?
I don't use a HDFS storage but NAS file system which is co
First of all try with FQD or full ip.
Also in order to run HA cluster you need to make sure that you have
password less ssh access to your slaves and master communication. .
On Tue, Aug 21, 2018 at 4:15 PM mozer
wrote:
> I am trying to install a Flink HA cluster (Zookeeper mode) but the t
I am trying to install a Flink HA cluster (Zookeeper mode) but the task
manager cannot find the job manager.
Here I give you the architecture;
- Machine 1 : Job Manager + Zookeeper
- Machine 2 : Task Manager
masters:
Machine1
slaves :
Machine2
flink-conf.yaml
I am trying to install a Flink HA cluster (Zookeeper mode) but the task
manager cannot find the job manager.
Here I give you the architecture;
- Machine 1 : Job Manager + Zookeeper
- Machine 2 : Task Manager
masters:
Machine1
slaves :
Machine2
flink-conf.yaml
We're looking at DR scenarios for our Flink cluster. We already use
Zookeeper for JM HA. We use a HDFS cluster that's replicated off-site, and
our
high-availability.zookeeper.storageDir
property is configure to use HDFS.
However, in the event of a site-failure, is it also essential that we have
a
erridden. By default, the temporary
>>>directory is used.
>>>-
>>>
>>>jobmanager.web.upload.dir: The config parameter defining the
>>>directory for uploading the job jars. If not specified a dynamic
>>> directory
>>>
nfig parameter defining the
>>directory for uploading the job jars. If not specified a dynamic directory
>>will be used under the directory specified by jobmanager.web.tmpdir.
>>
>>
>> Regards,
>>
>> Chirag
>>
>>
>>
>> On Su
cified a dynamic directory will be
>used under the directory specified by jobmanager.web.tmpdir.
>
>
> Regards,
>
> Chirag
>
>
>
> On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana
> wrote:
>
>
> Hi,
>
> I have a very basic Flink HA setup on Kubern
jobmanager.web.tmpdir.
Regards,
Chirag
On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana
wrote:
Hi,
I have a very basic Flink HA setup on Kubernetes and wanted to retain
job jars on JobManager Restarts.
For HA I am using a Zookeeper and a NFS drive mounted on all pods
ter defining the directory
>for uploading the job jars. If not specified a dynamic directory will be
>used under the directory specified by jobmanager.web.tmpdir.
>
>
> Regards,
>
> Chirag
>
>
>
> On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana <
>
the directory specified by jobmanager.web.tmpdir.
Regards,
Chirag
On Sunday, 6 May, 2018, 12:29:43 AM IST, Rohil Surana
wrote:
Hi,
I have a very basic Flink HA setup on Kubernetes and wanted to retain job jars
on JobManager Restarts.
For HA I am using a Zookeeper and a NFS drive mounted
Hi,
I have a very basic Flink HA setup on Kubernetes and wanted to retain job
jars on JobManager Restarts.
For HA I am using a Zookeeper and a NFS drive mounted on all pods
(JobManager and TaskManagers), that is being used for checkpoints and have
also set the `web.upload.dir: /data/flink
Hi Alexis,
Were you able to make this work ? I am also looking for zepplin integration
with Flink and this might be helpful.
Thanks
Santosh
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
hypra)
wrote:
> Hi – We’re currently testing Flink HA and running into a zookeeper timeout
> issue. Error log below.
> Is there a production checklist or any information on parameters that are
> related to flink HA that I need to pay attention to?
> Any pointers would really help
;>
>> If you don’t want to actually rip way into the code for the Job Manager
>> the ETCD Operator would be a good way to bring up an ETCD cluster that is
>> separate from the core Kubernetes ETCD database. Combined with zetcd you
>> could probably have that up and running qu
> If you don’t want to actually rip way into the code for the Job Manager
> the ETCD Operator <https://github.com/coreos/etcd-operator> would
> be a good way to bring up an ETCD cluster that is separate from the core
> Kubernetes ETCD database. Combined with zetcd you could probably hav
kly.
Thanks,
James Bucher
From: Hao Sun mailto:ha...@zendesk.com>>
Date: Monday, August 21, 2017 at 9:45 AM
To: Stephan Ewen mailto:se...@apache.org>>, Shannon Carey
mailto:sca...@expedia.com>>
Cc: "user@flink.apache.org<mailto:user@flink.apache.org>"
mailto:u
;> where the JobManager stores information which needs to be recovered after
>> the JobManager fails.
>>
>> We're eyeing https://github.com/coreos/zetcd
>> <https://github.com/coreos/zetcd> as a way to run
>> Zookeeper on top of Kubernetes' etcd cl
https://github.com/coreos/zetcd as a way to run Zookeeper on
> top of Kubernetes' etcd cluster so that we don't have to rely on a separate
> Zookeeper cluster. However, we haven't tried it yet.
>
> -Shannon
>
> From: Hao Sun
> Date: Sunday, August 20, 2017 at 9:04 PM
don't have to rely on a separate
Zookeeper cluster. However, we haven't tried it yet.
-Shannon
From: Hao Sun mailto:ha...@zendesk.com>>
Date: Sunday, August 20, 2017 at 9:04 PM
To: "user@flink.apache.org<mailto:user@flink.apache.org>"
mailto:user@flink.apache.org>
Hi, I am new to Flink and trying to bring up a Flink cluster on top of
Kubernetes.
For HA setup, with kubernetes, I think I just need one job manager and do
not need Zookeeper? I will store all states to S3 buckets. So in case of
failure, kubernetes can just bring up a new job manager without losi
Hi Robert, Hi Till,
I tried to setup high-availibility options in zepplin, but i guess it's
just a matter of flink version compatibility on zepplin side. I'll try to
compile zepplin with 1.2 and add needed parameter to see if its better.
Thanks for your help !
2017-03-27 15:09 GMT+02:00 Till Rohr
Hi Maciek and Alexis,
as far as I can tell, I think it is currently not possible to use Zeppelin
with a Flink cluster running in HA mode. In order to make it work, it would
be necessary to specify either a Flink configuration for the Flink
interpreter (this is probably the most general solution) o
Hi Alexis,
did you set the Zookeeper configuration for Flink in Zeppelin?
On Mon, Mar 20, 2017 at 11:37 AM, Alexis Gendronneau <
a.gendronn...@gmail.com> wrote:
> Hello users,
>
> As Maciek, I'm currently trying to make apache Zeppelin 0.7 working with
> Flink. I have two versions of flink avail
Hello users,
As Maciek, I'm currently trying to make apache Zeppelin 0.7 working with
Flink. I have two versions of flink available (1.1.2 and 1.2.0). Each one
is running in High-availability mode.
When running jobs from Zeppelin in Flink local mode, everything works fine.
But when trying to subm
Shouldn't the else branch
```
else
HIGH_AVAILABILITY=${DEPRECATED_HA}
fi
```
set it to `zookeeper`? Of course, the truth is whatever the script
execution prints out. ;-)
PS Emails like this should either go to the dev list or it's also fine
to open an issue and discuss there (and potentially
Hi,
I've tried to start cluster with HA mode as described in the doc, but with
a current state of bin/config.sh I failed.
I think there is a bug with configuring the HIGH_AVAILABILITY variable in
block (bin/config.sh):
if [ -z "${HIGH_AVAILABILITY}" ]; then
HIGH_AVAILABILITY=$(readFromConfi
+Till Rohrmann , do you know what can be used to
access a HA cluster from that setting.
Adding Till since he probably knows the HA stuff best.
On Sun, 22 Jan 2017 at 15:58 Maciek Próchniak wrote:
> Hi,
>
> I have standalone Flink cluster configured with HA setting (i.e. with
> zookeeper recover
Hi,
I have standalone Flink cluster configured with HA setting (i.e. with
zookeeper recovery). How should I access it remotely, e.g. with Zeppelin
notebook or scala shell?
There are settings for host/port, but with HA setting they are not fixed
- if I check which is *current leader* host and
Hi Thomas,
To avoid having jobs forever restarting, you have to cancel them manually
(from the web interface or the /bin/flink client).
Also, you can set an appropriate restart strategy (in 1.0-SNAPSHOT), which
limits the number of retries. This way the retrying will eventually stop.
On Fri, Feb
On Thu, Feb 18, 2016 at 6:59 PM, Thomas Lamirault
wrote:
> We are trying flink in HA mode.
Great to hear!
> We set in the flink yaml :
>
> state.backend: filesystem
>
> recovery.mode: zookeeper
> recovery.zookeeper.quorum:
>
> recovery.zookeeper.path.root:
>
> recovery.zookeeper.storageDir:
>
Hi !
We are trying flink in HA mode.
Our application is a streaming application with windowing mechanism.
We set in the flink yaml :
state.backend: filesystem
recovery.mode: zookeeper
recovery.zookeeper.quorum:
recovery.zookeeper.path.root:
recovery.zookeeper.storageDir:
recovery.back
t;> >>> Hi Stefano,
>>> >>>
>>> >>> The Job should stop temporarily but then be resumed by the new
>>> >>> JobManager. Have you increased the number of execution retries?
>>> AFAIK,
>>> >>> it is set to 0 by de
gt;> JobManager. Have you increased the number of execution retries? AFAIK,
>> >>> it is set to 0 by default. This will not re-run the job, even in HA
>> >>> mode. You can enable it on the StreamExecutionEnvironment.
>> >>>
>> >>> Otherwis
gt; >>> mode. You can enable it on the StreamExecutionEnvironment.
> >>>
> >>> Otherwise, you have probably already found the documentation:
> >>>
> >>>
> https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availabil
e, you have probably already found the documentation:
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availability.html#configuration
>>>
>>> Cheers,
>>> Max
>>>
>>> On Mon, Feb 15, 2016 at 12:35 PM,
> On 15 Feb 2016, at 13:40, Stefano Baghino
> wrote:
>
> Hi Ufuk, thanks for replying.
>
> Regarding the masters file: yes, I've specified all the masters and checked
> out that they were actually running after the start-cluster.sh. I'll gladly
> share the logs as soon as I get to see them.
have probably already found the documentation:
>>
>> https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availability.html#configuration
>>
>> Cheers,
>> Max
>>
>> On Mon, Feb 15, 2016 at 12:35 PM, Stefano Baghino
>> wrote
35 PM, Stefano Baghino
> wrote:
> > Hello everyone,
> >
> > last week I've ran some tests with Apache ZooKeeper to get a grip on
> Flink
> > HA features. My tests went bad so far and I can't sort out the reason.
> >
> > My latest tests involved F
On Mon, Feb 15, 2016 at 12:35 PM, Stefano Baghino
> wrote:
> > Hello everyone,
> >
> > last week I've ran some tests with Apache ZooKeeper to get a grip on
> Flink
> > HA features. My tests went bad so far and I can't sort out the reason.
> >
> > My latest te
oKeeper to get a grip on Flink
> HA features. My tests went bad so far and I can't sort out the reason.
>
> My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3
> masters and 4 slaves. The 3 masters are also the ZooKeeper (3.4.6) ensemble.
> I've started
.
Can you please share the job manager logs of all started job managers?
– Ufuk
On Mon, Feb 15, 2016 at 12:35 PM, Stefano Baghino
wrote:
> Hello everyone,
>
> last week I've ran some tests with Apache ZooKeeper to get a grip on Flink
> HA features. My tests went bad so far and
Hello everyone,
last week I've ran some tests with Apache ZooKeeper to get a grip on Flink
HA features. My tests went bad so far and I can't sort out the reason.
My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3
masters and 4 slaves. The 3 masters are also the
done yet.
>
> Best, Fabian
> On Sep 10, 2015 01:29, "Emmanuel" wrote:
>
>> is this a 0.10 snapshot feature only? I'm using 0.9.1 right now
>>
>>
>> --
>> From: ele...@msn.com
>> To: user@flink.apache.org
>&g
ng 0.9.1 right now
>
>
> --
> From: ele...@msn.com
> To: user@flink.apache.org
> Subject: RE: Flink HA mode
> Date: Wed, 9 Sep 2015 16:11:38 -0700
>
> Been playing with the HA...
> I find the UIs confusing here:
> in the dashboard on one side I see 0 slots 0 taskmanag
is this a 0.10 snapshot feature only? I'm using 0.9.1 right now
From: ele...@msn.com
To: user@flink.apache.org
Subject: RE: Flink HA mode
Date: Wed, 9 Sep 2015 16:11:38 -0700
Been playing with the HA...I find the UIs confusing here: in the dashboard on
one side I see 0 slots 0 taskman
e multiple JMs IPs in the
jobmanager.rpc.address?
Thanks
Date: Wed, 9 Sep 2015 10:19:36 +0200
Subject: Re: Flink HA mode
From: trohrm...@apache.org
To: user@flink.apache.org
The only necessary information for the JobManager and TaskManager is to know
where to find the ZooKeeper quorum to do leader election
The only necessary information for the JobManager and TaskManager is to
know where to find the ZooKeeper quorum to do leader election and retrieve
the leader address from. This will be configured via the config parameter
`ha.zookeeper.quorum`.
On Wed, Sep 9, 2015 at 10:15 AM, Stephan Ewen wrote:
TL;DR is that you are right, it is only the initial list. If a JobManager
comes back with a new IP address, it will be available.
On Wed, Sep 9, 2015 at 8:35 AM, Ufuk Celebi wrote:
>
> > On 09 Sep 2015, at 04:48, Emmanuel wrote:
> >
> > my questions is: how critical is the bootstrap ip list in
> On 09 Sep 2015, at 04:48, Emmanuel wrote:
>
> my questions is: how critical is the bootstrap ip list in masters?
Hey Emmanuel,
good questions. I read over the docs for this again [1] and you are right that
we should make this clearer.
The “masters" file is only relevant for the start/stop
my questions is: how critical is the bootstrap ip list in masters? does this
get updated or does it have to be updated by some other service?
From: zhangruc...@huawei.com
To: user@flink.apache.org
Subject: re: Flink HA mode
Date: Wed, 9 Sep 2015 00:48:42 +
In order to discover new
[mailto:ele...@msn.com]
发送时间: 2015年9月9日 7:59
收件人: user@flink.apache.org
主题: Flink HA mode
Looking at Flink HA mode.
Why do you need to have the list of masters in the config if zookeeper is used
to keep track of them?
In an environment like Google Cloud or Container Engine, the JM may come back
up
Looking at Flink HA mode.
Why do you need to have the list of masters in the config if zookeeper is used
to keep track of them? In an environment like Google Cloud or Container Engine,
the JM may come back up but will likely have another IP address.
Is the masters config file only for
83 matches
Mail list logo