Re: Flink on kubernetes HA ::Renew deadline reached

2021-10-25 Thread marco
Hello thanks for your response, I just want to know why you suggest that i should use the native Kubernetes HA. Im my case I need to use the standalone application mode. On 2021/10/25 09:25:21, Xintong Song wrote: > You should be using the native Kubernetes HA. This error message suggests > Fl

Re: Flink on kubernetes HA ::Renew deadline reached

2021-10-25 Thread marco
Any suggestions would be appreciated. On 2021/10/20 16:18:39, marco wrote: > > > Hello flink community:: > > I am deploying flink application cluster standalone mode on kubernetes, but i > am facing some problems > > the job starts normally and it continues to run but at some point in time

RE: [External] Re: Flink on Kubernetes

2021-09-03 Thread Julian Cardarelli
the contents of this information is strictly prohibited. From: Guowei Ma Sent: Thursday, September 2, 2021 11:32 PM To: Julian Cardarelli Cc: user Subject: [External] Re: Flink on Kubernetes Hi, Julian I notice that your configuration includes "restart-strategy.fixed-delay.attempt

Re: Flink on Kubernetes

2021-09-02 Thread Guowei Ma
Hi, Julian I notice that your configuration includes "restart-strategy.fixed-delay.attempts: 10". It means that the job would fail after 10 times failure. So maybe it leads to the job not restarting again and you could increase this value. But I am not sure if this is the root cause. So if this do

Re: Flink on Kubernetes, Task/Job Manager Recycles

2021-01-29 Thread Yang Wang
I think you need to enable the HA(high availability) for your Flink cluster[1]. Currently, we have the ZooKeeperHAService and KubernetesHAService. In the HA mode, all the meta data(e.g. job graph path, checkpoint counter, checkpoint path) will be stored on ZooKeeper or Kubernetes ConfigMap. And the

Re: Flink on Kubernetes

2020-05-21 Thread Yang Wang
Hi lvan Yang, #1. If a TaskManager crashed exceptionally and there are some running task on it, it could not join back gracefully. Whether the full job will be restarted depends on the failover strategies[1]. #2. Currently, when new TaskManagers join to the Flink cluster, the running Flink job co

Re: Flink on Kubernetes unable to Recover from failure

2020-05-08 Thread Yun Tang
ontainers running do not mean they're all registered to the job manager, I think you could refer to the JM and TM log to see whether the register connection is lost. Best Yun Tang From: Robert Metzger Sent: Friday, May 8, 2020 22:33 To: Morgan Geldenhuys Cc:

Re: Flink on Kubernetes unable to Recover from failure

2020-05-08 Thread Robert Metzger
Hey Morgan, Is it possible for you to provide us with the full logs of the JobManager and the affected TaskManager? This might give us a hint why the number of task slots is zero. Best, Robert On Tue, May 5, 2020 at 11:41 AM Morgan Geldenhuys < morgan.geldenh...@tu-berlin.de> wrote: > > Commun

Re: Flink on Kubernetes Vs Flink Natively on Kubernetes

2020-03-17 Thread Pankaj Chand
Thank you, Yang and Xintong! Best, Pankaj On Mon, Mar 16, 2020, 9:27 PM Yang Wang wrote: > Hi Pankaj, > > Just like Xintong has said, the biggest difference of Flink on Kubernetes > and native > integration is dynamic resource allocation. Since the latter has en > embedded K8s > client and wil

Re: Flink on Kubernetes Vs Flink Natively on Kubernetes

2020-03-16 Thread Yang Wang
Hi Pankaj, Just like Xintong has said, the biggest difference of Flink on Kubernetes and native integration is dynamic resource allocation. Since the latter has en embedded K8s client and will communicate with K8s Api server directly to allocate/release JM/TM pods. Both for the two ways to run Fl

Re: Flink on Kubernetes Vs Flink Natively on Kubernetes

2020-03-16 Thread Pankaj Chand
Hi Xintong, Thank you for the explanation! If I run Flink "natively" on Kubernetes, will I also be able to run Spark on the same Kubernetes cluster, or will it make the Kubernetes cluster be reserved for Flink only? Thank you! Pankaj On Mon, Mar 16, 2020 at 5:41 AM Xintong Song wrote: > Forg

Re: Flink on Kubernetes Vs Flink Natively on Kubernetes

2020-03-16 Thread Xintong Song
Hi Pankaj, "Running Flink on Kubernetes" refers to the old way that basically deploys a Flink standalone cluster on Kubernetes. We leverage scripts to run Flink Master and TaskManager processes inside Kubernetes container. In this way, Flink is not ware of whether it's running in containers or dir

Re: Flink on Kubernetes Vs Flink Natively on Kubernetes

2020-03-16 Thread Xintong Song
Forgot to mention that "running Flink natively on Kubernetes" is newly introduced and is only available for Flink 1.10 and above. Thank you~ Xintong Song On Mon, Mar 16, 2020 at 5:40 PM Xintong Song wrote: > Hi Pankaj, > > "Running Flink on Kubernetes" refers to the old way that basically d

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-28 Thread Hao Sun
Sounds good. Thank you! Hao Sun On Thu, Feb 27, 2020 at 6:52 PM Yang Wang wrote: > Hi Hao Sun, > > I just post the explanation to the user ML so that others could also have > the same problem. > > Gven the job graph is fetched from the jar, do we still need Zookeeper for >> HA? Maybe we still

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-27 Thread Yang Wang
Hi Hao Sun, I just post the explanation to the user ML so that others could also have the same problem. Gven the job graph is fetched from the jar, do we still need Zookeeper for > HA? Maybe we still need it for checkpoint locations? Yes, we still need the zookeeper(maybe in the future we will

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-27 Thread Yang Wang
Hi Jin Yi, For standalone per-job cluster, it is a little different about the recovery. Just as you say, the user jar has built in the image, when the JobManager failed and relaunched by the K8s, the user `main()` will be executed again to get the job graph, not like session cluster to get the job

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-26 Thread Jin Yi
Hi Yang, regarding your statement below: Since you are starting JM/TM with K8s deployment, when they failed new JM/TM will be created. If you do not set the high availability configuration, your jobs could recover when TM failed. However, they could not recover when JM failed. With HA configured,

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-26 Thread Yang Wang
I think the only limitation is the disk size of your kubelet machine. Please remember to set the "sizeLimit" of your empty dir. Otherwise, your pod may be killed due to ephemeral storage is full. Best, Yang M Singh 于2020年2月27日周四 上午8:34写道: > BTW - Is there any limit to the amount of data that c

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-26 Thread M Singh
BTW - Is there any limit to the amount of data that can be stored on emptyDir in K8 ?   On Wednesday, February 26, 2020, 07:33:54 PM EST, M Singh wrote: Thanks Yang and Arvid for your advice and pointers.  Mans On Wednesday, February 26, 2020, 09:52:26 AM EST, Arvid Heise wrote:

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-26 Thread M Singh
Thanks Yang and Arvid for your advice and pointers.  Mans On Wednesday, February 26, 2020, 09:52:26 AM EST, Arvid Heise wrote: Creds on AWS are typically resolved through roles assigned to K8s pods (for example with KIAM [1]). [1] https://github.com/uswitch/kiam On Tue, Feb 25, 2020 at

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-26 Thread Arvid Heise
Creds on AWS are typically resolved through roles assigned to K8s pods (for example with KIAM [1]). [1] https://github.com/uswitch/kiam On Tue, Feb 25, 2020 at 3:36 AM Yang Wang wrote: > Hi M Singh, > > > Mans - If we use the session based deployment option for K8 - I thought >> K8 will automat

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-24 Thread Yang Wang
Hi M Singh, > Mans - If we use the session based deployment option for K8 - I thought > K8 will automatically restarts any failed TM or JM. > In the case of failed TM - the job will probably recover, but in the case > of failed JM - perhaps we need to resubmit all jobs. > Let me know if I have mis

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-24 Thread M Singh
Thanks Wang for your detailed answers. >From what I understand the native_kubernetes also leans towards creating a >session and submitting a job to it.   Regarding other recommendations, please my inline comments and advice. On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang wrote:

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-23 Thread Yang Wang
Hi Singh, Glad to hear that you are looking to run Flink on the Kubernetes. I am trying to answer your question based on my limited knowledge and others could correct me and add some more supplements. I think the biggest difference between session cluster and per-job cluster on Kubernetesis the i

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-12 Thread Li Peng
>>>> >>>> Li Peng 于2019年12月11日周三 下午1:24写道: >>>> >>>>> Ah I see. I think the Flink app is reading files from >>>>> /opt/flink/conf correctly as it is, since changes I make to flink-conf are >>>>> picked up as expected, it&

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-12 Thread Li Peng
used, or don't apply to stdout or whatever source k8 uses for its >>>> logs? Given that the pods don't seem to have logs written to file >>>> anywhere, contrary to the properties, I'm inclined to say it's the former >>>> and that the log4j pr

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-12 Thread ouywl
have no idea why though.On Tue, Dec 10, 2019 at 6:56 PM Yun Tang <myas...@live.com> wrote: Sure, /opt/flink/conf is mounted as a volume from the configmap.   Best Yun Tang   From: Li Peng <li.p...@doordash.com> Date: Wednesday, December 11, 2019 at 9:37 AM To: Yang Wang <danr

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-11 Thread ouywl
#x27;t being picked up. Still have no idea why though.On Tue, Dec 10, 2019 at 6:56 PM Yun Tang <myas...@live.com> wrote: Sure, /opt/flink/conf is mounted as a volume from the configmap.   Best Yun Tang   From: Li Peng <li.p...@doordash.com> Date: Wednesday, December 11, 2019 at 9:

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-11 Thread Yang Wang
ds don't seem to have logs written to file >>> anywhere, contrary to the properties, I'm inclined to say it's the former >>> and that the log4j properties just aren't being picked up. Still have no >>> idea why though. >>> >>> On Tue,

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-11 Thread Li Peng
;t being picked up. Still have no >> idea why though. >> >> On Tue, Dec 10, 2019 at 6:56 PM Yun Tang wrote: >> >>> Sure, /opt/flink/conf is mounted as a volume from the configmap. >>> >>> >>> >>> Best >>> >>> Yun

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Yang Wang
ue, Dec 10, 2019 at 6:56 PM Yun Tang wrote: > >> Sure, /opt/flink/conf is mounted as a volume from the configmap. >> >> >> >> Best >> >> Yun Tang >> >> >> >> *From: *Li Peng >> *Date: *Wednesday, December 11, 2019 at 9:37

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Li Peng
, /opt/flink/conf is mounted as a volume from the configmap. > > > > Best > > Yun Tang > > > > *From: *Li Peng > *Date: *Wednesday, December 11, 2019 at 9:37 AM > *To: *Yang Wang > *Cc: *vino yang , user > *Subject: *Re: Flink on Kubernetes seems to ignore

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Yun Tang
Sure, /opt/flink/conf is mounted as a volume from the configmap. Best Yun Tang From: Li Peng Date: Wednesday, December 11, 2019 at 9:37 AM To: Yang Wang Cc: vino yang , user Subject: Re: Flink on Kubernetes seems to ignore log4j.properties 1. Hey Yun, I'm calling /opt/flink/bin/stand

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-10 Thread Li Peng
1. Hey Yun, I'm calling /opt/flink/bin/standalone-job.sh and /opt/flink/bin/taskmanager.sh on my job and task managers respectively. It's based on the setup described here: http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/ . I haven't tried the configmap approach yet, doe

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread Yang Wang
Hi Li Peng, You are running standalone session cluster or per-job cluster on kubernetes. Right? If so, i think you need to check your log4j.properties in the image, not local. The log is stored to /opt/flink/log/jobmanager.log by default. If you are running active Kubernetes integration for a fre

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread Yun Tang
Hi Peng What kind of deployment of K8s did you try in flink-doc[1], if using session mode, you can control your log4j configuration via configmap [2]. From my experience, this could control the log4j well. If you did not override the command of flink docker, it will start-foreground the taskma

Re: Flink on Kubernetes seems to ignore log4j.properties

2019-12-09 Thread vino yang
Hi Li, A potential reason could be conflicting logging frameworks. Can you share the log in your .out file and let us know if the print format of the log is the same as the configuration file you gave. Best, Vino Li Peng 于2019年12月10日周二 上午10:09写道: > Hey folks, I noticed that my kubernetes flink

Re: Flink on Kubernetes - Hostname resolution between job/tasks-managers

2019-01-15 Thread bastien dine
Nevermind.. Problem already discussed in thread : Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment" -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mar. 15 janv. 2019 à 15:16, bastien dine a écrit : > Hello, >

Re: Flink on Kubernetes (Minikube)

2018-12-19 Thread Till Rohrmann
Hi Alexandru, minikube ssh 'sudo ip link set docker0 promisc on' is not supposed to solve the problem you are seeing. It only resolves the problem if the JobMaster wants to reach itself through the jobmanager-service name. Your problem seems to be something else. Could you check if jobmanager-serv

Re: Flink on Kubernetes (Minikube)

2018-12-19 Thread Alexandru Gutan
Got it working on the Google Cloud Platform Kubernetes service... More support for Minikube is needed. On Wed, 19 Dec 2018 at 13:44, Alexandru Gutan wrote: > I've found this in the archives: > http://mail-archives.apache.org/mod_mbox/flink-dev/201804.mbox/%3CCALbFKXr=rp9TYpD_JA8vmuWbcjY0+Lp2mbr4

Re: Flink on Kubernetes (Minikube)

2018-12-19 Thread Alexandru Gutan
I've found this in the archives: http://mail-archives.apache.org/mod_mbox/flink-dev/201804.mbox/%3CCALbFKXr=rp9TYpD_JA8vmuWbcjY0+Lp2mbr4Y=0fnh316hz...@mail.gmail.com%3E And as suggested I tried a different startup order but unsuccessful: kubectl create -f jobmanager-deployment.yaml kubectl create

Re: Flink on Kubernetes (Minikube)

2018-12-19 Thread Dawid Wysakowicz
Hi Alexandru, This sounds reasonable that it might be because of this minikube command failed, but I am not a kubernetes expert. I cc Till who knows more on this. Best, Dawid On 19/12/2018 14:16, Alexandru Gutan wrote: > Thanks! > I'm using now the *flink:1.7.0-hadoop24-scala_2.12* image. > The

Re: Flink on Kubernetes (Minikube)

2018-12-19 Thread Alexandru Gutan
Thanks! I'm using now the *flink:1.7.0-hadoop24-scala_2.12* image. The Hadoop related error is gone, but I have a new error: Starting Task Manager config file: jobmanager.rpc.address: flink-jobmanager jobmanager.rpc.port: 6123 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m taskmanager.nu

Re: Flink on Kubernetes (Minikube)

2018-12-19 Thread Dawid Wysakowicz
Hi, You used a hadoopless docker image, therefore it cannot find hadoop dependencies. It is ok if you don't need to use any, the bolded messages are just INFO, those are not errors. Best, Dawid On 19/12/2018 12:58, Alexandru Gutan wrote: > Dear all, > > I followed the instructions found here: >

Re: Flink on kubernetes

2018-09-03 Thread Lasse Nedergaard
Please try to use fsstatebackend as a test to see if the problems disappear. Med venlig hilsen / Best regards Lasse Nedergaard > Den 3. sep. 2018 kl. 11.46 skrev 祁明良 : > > Hi Lasse, > > Is there JIRA ticket I can follow? > > Best, > Mingliang > >> On 3 Sep 2018, at 5:42 PM, Lasse Nedergaard

Re: Flink on kubernetes

2018-09-03 Thread 祁明良
Hi Lasse, Is there JIRA ticket I can follow? Best, Mingliang > On 3 Sep 2018, at 5:42 PM, Lasse Nedergaard wrote: > > Hi. > > We have documented the same on Flink 1.4.2/1.6 running on Yarn and Mesos. > If you correlate the none heap memory together with job restart you will see > none heap inc

Re: Flink on kubernetes

2018-09-03 Thread Lasse Nedergaard
Hi. We have documented the same on Flink 1.4.2/1.6 running on Yarn and Mesos. If you correlate the none heap memory together with job restart you will see none heap increases for every restart until you get an OOM. I let you know if/when I know how to handle the problem. Med venlig hilsen /

Re: flink on kubernetes

2018-08-13 Thread Till Rohrmann
Hi Mingliang, I'm currently writing the updated documentation for Flink's job cluster container entrypoint. It should be ready later today. In the meantime, you can checkout the `flink-container` module and its subdirectories. They already contain some information in the README.md on how to create

Re: flink on kubernetes

2018-08-12 Thread vino yang
Hi mingliang, Yes, you are right, the information that Flink on Kubernetes' current documentation can provide is not very detailed. However, considering that Kubernetes is so popular, the Flink community is currently refining it, this work is mainly done by Till, and you can follow this issue [1]

Re: Flink on kubernetes -> shell deployment

2017-06-08 Thread Kaepke, Marc
Hi Nico, thanks for your help. $ kubectl exex -it /bin/bash that was what I was looking for. This command provides a shell directly into my job-manager instance. Best, Marc > Am 08.06.2017 um 12:05 schrieb Nico Kruber : > > If you have access to the web dashboard, you probably have access to

Re: Flink on kubernetes -> shell deployment

2017-06-08 Thread Nico Kruber
If you have access to the web dashboard, you probably have access to the Jobmanager in general and can submit jobs from your command line by passing flink run --jobmanager ... I've looped in Patrick in case I am missing something kubernetes-specific here. Nico On Wednesday, 7 June 2017 16:0