Flink k8s operator

2024-12-11 Thread Xiang Wang via user
Hello Flink group, I am currently evaluating Flink K8s operator to manage our Flink applications, but there are some concerns raised by our K8s cluster admins which I hope to get answers from this group: 1. What is the webhook used for? The documentation of webhook is very limited. Am I risking

Re: Flink k8s operator unstable deployment

2024-08-31 Thread Naci Simsek
Hi Arthur,In your initial mail, it was seen an explicit job id set:$internal.pipeline.job-id, 044d28b712536c1d1feed3475f2b8111This might be the reason of duplicatedJobSubmission exception.In the job config on your last reply, I could not see such setting. You could verify from the JM logs that when

Re: Flink k8s operator unstable deployment

2024-08-30 Thread Arthur Catrisse via user
Hi Naci, Thanks for your answer. We do not explicitly define the job-id. As we are using the flink-kubernetes-operator, I suppose it's the operator handling this ID. The job is defined in the FlinkDeployment charts, where we have specs for jobmanager, taskmanager and the job : job: jarURI: local:/

Re: Flink k8s operator unstable deployment

2024-08-28 Thread Naci Simsek
Hi Arthur,How you submit your job? Are you explicitly setting job id when submitting the job?Have you also tried without HA to see the behavior?Looks like the job is submitted with the same ID with the previous job, which the job result stored in HA does not let you submit it with the same job_id.B

Flink k8s operator unstable deployment

2024-08-28 Thread Arthur Catrisse via user
Hello, We are running into issues when deploying flink on kubernetes using the flink-kubernetes-operator with a FlinkDeployment Occasionally, when a *JobManager* gets rotated out (by karpenter in our case), the next JobManager is incapable of getting into a stable state and is stuck in a crash loo

Flink k8s operator starts wrong job config from FlinkSessionJob

2024-06-26 Thread Peter Klauke
Hi all, we're running a session cluster and I submit around 20 jobs to it at the same time by creating FlinkSessionJob Kubernetes resources. After sufficient time there are 20 jobs created and running healthy. However, it appears that some jobs are started with the wrong config. As a result some j

Re: Autoscaling with flink-k8s-operator 1.8.0

2024-05-02 Thread Chetas Joshi
Hi Gyula, Thanks for getting back and explaining the difference in the responsibilities of the autoscaler and the operator. I figured out what the issue was. Here is what I was trying to do: the autoscaler had initially down-scaled (2->1) the flinkDeployment so there was pipeline.jobvertex-parall

Re: Autoscaling with flink-k8s-operator 1.8.0

2024-05-01 Thread Gyula Fóra
Hi Chetas, The operator logic itself would normally call the rescale api during the upgrade process, not the autoscaler module. The autoscaler module sets the correct config with the parallelism overrides, and then the operator performs the regular upgrade cycle (as when you yourself change someth

Autoscaling with flink-k8s-operator 1.8.0

2024-05-01 Thread Chetas Joshi
Hello, We recently upgraded the operator to 1.8.0 to leverage the new autoscaling features ( https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/custom-resource/autoscaler/). The FlinkDeployment (application cluster) is set to flink v1_18 as well. I am able to observ

Flink K8s operator ignores taskmanager.host setting

2023-11-29 Thread Salva Alcántara
Hi! I'm deploying a job via the Flink K8s Operator with these settings in the FlinkDeployment resource: ``` spec: flinkConfiguration: taskmanager.host: 0.0.0.0 <-- ignored / not applied ``` When I look into the flink-conf.yaml in the TM the setting is not there. Is there an

Re: Using Flink k8s operator on OKD

2023-10-05 Thread Gyula Fóra
t;>> resources: >>> - deployments >>> - deployments/finalizers >>> verbs: >>> - '*' >>> --- >>> apiVersion: rbac.authorization.k8s.io/v1 >>> kind: RoleBinding >>> metadata: >>> labels: >>&g

Re: Using Flink k8s operator on OKD

2023-10-05 Thread Krzysztof Chmielewski
/finalizers >> verbs: >> - '*' >> --- >> apiVersion: rbac.authorization.k8s.io/v1 >> kind: RoleBinding >> metadata: >> labels: >> app.kubernetes.io/name: flink-kubernetes-operator >> app.kubernetes.io/version: 1.5.0 >> name

Re: Using Flink k8s operator on OKD

2023-09-20 Thread Krzysztof Chmielewski
; name: flink-role-binding > roleRef: > apiGroup: rbac.authorization.k8s.io > kind: Role > name: flink > subjects: > - kind: ServiceAccount > name: flink > EOF > > Hopefully that helps. > > > On Tue, Sep 19, 2023 at 5:40 PM Krzysztof Chmielewski < >

Re: Using Flink k8s operator on OKD

2023-09-19 Thread Zach Lorimer
want to have Flink deployments in. kubectl apply -f - < wrote: > Hi community, > I was wondering if anyone tried to deploy Flink using Flink k8s operator > on machine where OKD [1] is installed? > > We have tried to install Flink k8s operator version 1.6 which seems to > succeed

Using Flink k8s operator on OKD

2023-09-19 Thread Krzysztof Chmielewski
Hi community, I was wondering if anyone tried to deploy Flink using Flink k8s operator on machine where OKD [1] is installed? We have tried to install Flink k8s operator version 1.6 which seems to succeed, however when we try to deploy simple Flink deployment we are getting an error. 2023-09-19

Re: Flink K8S operator does not support IPv6

2023-09-05 Thread Xiaolong Wang
FYI, adding environment variables of ` KUBERNETES_DISABLE_HOSTNAME_VERIFICATION=true` works for me. This env variable needs to be added to both the Flink operator and the Flink job definition. On Tue, Aug 8, 2023 at 12:03 PM Xiaolong Wang wrote: > Ok, thank you. > > On Tue, Aug 8, 2023 at 11:22

Re: flink k8s operator - problem with patching seession cluster

2023-08-31 Thread Gyula Fóra
How to add TM to Flink Session cluster via Java K8s client if Session >>> Cluster has running jobs? >>> >>> Thanks, >>> Krzysztof >>> >>> pt., 25 sie 2023 o 23:48 Krzysztof Chmielewski < >>> krzysiek.chmielew...@gmail.com> napisał

Re: flink k8s operator - problem with patching seession cluster

2023-08-31 Thread Krzysztof Chmielewski
l.com> napisał(a): >> >>> Hi community, >>> I have a use case where I would like to add an extra TM) to a running >>> Flink session cluster that has Flink jobs deployed. Session cluster >>> creation, job submission and cluster patching is done using flink k8s

Re: flink k8s operator - problem with patching seession cluster

2023-08-31 Thread Gyula Fóra
wski < > krzysiek.chmielew...@gmail.com> napisał(a): > >> Hi community, >> I have a use case where I would like to add an extra TM) to a running >> Flink session cluster that has Flink jobs deployed. Session cluster >> creation, job submission and cluster pat

Re: flink k8s operator - problem with patching seession cluster

2023-08-30 Thread Krzysztof Chmielewski
isał(a): > Hi community, > I have a use case where I would like to add an extra TM) to a running > Flink session cluster that has Flink jobs deployed. Session cluster > creation, job submission and cluster patching is done using flink k8s > operator Java API. The Details of this are pre

flink k8s operator - problem with patching seession cluster

2023-08-25 Thread Krzysztof Chmielewski
Hi community, I have a use case where I would like to add an extra TM) to a running Flink session cluster that has Flink jobs deployed. Session cluster creation, job submission and cluster patching is done using flink k8s operator Java API. The Details of this are presented here [1] I would like

Re: Flink k8s operator - managde from java microservice

2023-08-16 Thread Yaroslav Tkachenko
ink jobs using Apache Flink > k8s operator. > Where actions like job submission (new and from save point), Job cancel > with save point, cluster creations will be triggered from Java based micro > service. > > Is there any recommended/Dedicated Java API for Flink k8s operator? >

Flink k8s operator - managde from java microservice

2023-08-16 Thread Krzysztof Chmielewski
Hi, I have a use case where I would like to run Flink jobs using Apache Flink k8s operator. Where actions like job submission (new and from save point), Job cancel with save point, cluster creations will be triggered from Java based micro service. Is there any recommended/Dedicated Java API for

Re: Flink K8S operator does not support IPv6

2023-08-07 Thread Xiaolong Wang
Ok, thank you. On Tue, Aug 8, 2023 at 11:22 AM Peter Huang wrote: > We will handle it asap. Please check the status of this jira > https://issues.apache.org/jira/browse/FLINK-32777 > > On Mon, Aug 7, 2023 at 8:08 PM Xiaolong Wang > wrote: > >> Hi, >> >> I was testing flink-kubernetes-operator i

Re: Flink K8S operator does not support IPv6

2023-08-07 Thread Peter Huang
We will handle it asap. Please check the status of this jira https://issues.apache.org/jira/browse/FLINK-32777 On Mon, Aug 7, 2023 at 8:08 PM Xiaolong Wang wrote: > Hi, > > I was testing flink-kubernetes-operator in an IPv6 cluster and found out > the below issues: > > *Caused by: javax.net.ssl.

Flink K8S operator does not support IPv6

2023-08-07 Thread Xiaolong Wang
Hi, I was testing flink-kubernetes-operator in an IPv6 cluster and found out the below issues: *Caused by: javax.net.ssl.SSLPeerUnverifiedException: Hostname > fd70:e66a:970d::1 not verified:* > > *certificate: sha256/EmX0EhNn75iJO353Pi+1rClwZyVLe55HN3l5goaneKQ=* > > *DN: CN=kube-apiserve

Re: TM fails to register with JM while trying to run basic.yaml example with Flink K8S operator

2023-07-19 Thread Orkhan Dadashov
I have fixed the issue by increasing the CPU and memory for my JM and TM pods. Make sure your instance type can accommodate the required resources. On Wed, 19 Jul 2023 at 13:35, Orkhan Dadashov wrote: > Hi Flink users, > > I'm following up on this guide to try the Flink K8S o

TM fails to register with JM while trying to run basic.yaml example with Flink K8S operator

2023-07-19 Thread Orkhan Dadashov
Hi Flink users, I'm following up on this guide to try the Flink K8S operator (1.5.0 version ): https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.5/docs/try-flink-kubernetes-operator/quick-start/ When I try to deploy a basic example, JM and TM start, but TM fai

TM fails to register with JM while trying to run basic.yaml example with Flink K8S operator

2023-07-19 Thread Orkhan Dadashov
Hi Flink users, I'm following up on this guide to try the Flink K8S operator (1.5.0 version ): https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.5/docs/try-flink-kubernetes-operator/quick-start/ When I try to deploy a basic example, JM and TM start, but TM fai

Re: [Flink K8s Operator] Trigger nonce missing for manual savepoint info

2023-07-12 Thread Gyula Fóra
Maybe you have inconsistent operator / CRC versions? In any case I highly recommend upgrading to the lates operator version to get all the bug / security fixes and improvements. Gyula On Wed, 12 Jul 2023 at 10:58, Paul Lam wrote: > Hi, > > I’m using K8s operator 1.3.1 with Flink 1.15.2 on 2 K8s

[Flink K8s Operator] Trigger nonce missing for manual savepoint info

2023-07-12 Thread Paul Lam
Hi, I’m using K8s operator 1.3.1 with Flink 1.15.2 on 2 K8s clusters. Weird enough, on one K8s cluster the flink deployments would show savepoint trigger nonce. while the flink deployments on the other cluster wouldn’t. The normal output is as follows: ``` Last Savepoint: Format

Re: [Flink K8s Operator] Automatic cleanup of terminated deployments

2023-05-21 Thread Paul Lam
Hi Gyula, Thank you and sorry for the late response. My use case is that users may run finite jobs (either batch jobs or finite stream jobs), leaving a lot of deprecated flink deployments around. I’ve filed a ticket[1]. [1] https://issues.apache.org/jira/browse/FLINK-32143 Best, Paul Lam >

Re: [Flink K8s Operator] Automatic cleanup of terminated deployments

2023-05-14 Thread Gyula Fóra
There is no such feature currently, Kubernetes resources usually do not delete themselves :) The problem I see here is by deleting the resource you lose all information about what happened, you won't know if it failed or completed etc. What is the use-case you are thinking about? If this is someth

[Flink K8s Operator] Automatic cleanup of terminated deployments

2023-05-14 Thread Paul Lam
Hi all, Currently, if a job turns into terminated status (e.g. FINISHED or FAILED), the flinkdeployment remains until a manual cleanup is performed. I went through the docs but did not find any way to clean them up automatically. Am I missing something? Thanks! Best, Paul Lam

Re: Flink K8s operator pod section of CRD

2023-02-24 Thread Őrhidi Mátyás
Yep! Simple oversight, it was :/ Cheers, Matyas On Thu, Feb 23, 2023 at 10:54 PM Gyula Fóra wrote: > Hey! > You are right, these fields could have been of the PodTemplate / > PodTemplateSpec type (probably PodTemplateSpec is actually better). > > I think the reason why we used it is two fold:

Re: Flink K8s operator pod section of CRD

2023-02-23 Thread Gyula Fóra
Hey! You are right, these fields could have been of the PodTemplate / PodTemplateSpec type (probably PodTemplateSpec is actually better). I think the reason why we used it is two fold: - Simple oversight :) - Flink itself "expects" the podtemplate in this form for the native integration as you c

Flink K8s operator pod section of CRD

2023-02-23 Thread Mason Chen
Hi all, Why does the FlinkDeployment CRD refer to the Pod class instead of the PodTemplate class from the fabric8 library? As far as I can tell, the only difference is that the Pod class exposes the PodStatus, which doesn't seem mutable. Thanks in advance! Best, Mason

Re: [Flink K8s Operator] flinkdep stays in DEPLOYED and never turns STABLE

2022-12-06 Thread Paul Lam
fecycleState enum if you want a single condensed status view . > > Cheers > Gyula > > On Tue, 6 Dec 2022 at 04:12, Paul Lam <mailto:paullin3...@gmail.com>> wrote: > Hi all, > > I’m trying out Flink K8s operator 1.2 with K8s 1.25 and Flink 1.15. > > I found kub

Re: [Flink K8s Operator] flinkdep stays in DEPLOYED and never turns STABLE

2022-12-06 Thread Gyula Fóra
want a single condensed status view . Cheers Gyula On Tue, 6 Dec 2022 at 04:12, Paul Lam wrote: > Hi all, > > I’m trying out Flink K8s operator 1.2 with K8s 1.25 and Flink 1.15. > > I found kubectl shows that flinkdeployments stay in DEPLOYED like forever > (the Flink job

[Flink K8s Operator] flinkdep stays in DEPLOYED and never turns STABLE

2022-12-06 Thread Paul Lam
Hi all, I’m trying out Flink K8s operator 1.2 with K8s 1.25 and Flink 1.15. I found kubectl shows that flinkdeployments stay in DEPLOYED like forever (the Flink job status are RUNNING), but the operator logs shows that the flinkdeployments already turned into STABLE. Is that a known bug or

Re: [Flink K8s operator] HA metadata not available to restore from last state

2022-11-22 Thread Dongwon Kim
; should eliminate most of these cases. > > Cheers, > Gyula > > On Tue, Nov 22, 2022 at 9:43 AM Dongwon Kim wrote: > >> Hi, >> >> While using a last-state upgrade mode on flink-k8s-operator-1.2.0 and >> flink-1.14.3, we're occasionally facing the foll

Re: [Flink K8s operator] HA metadata not available to restore from last state

2022-11-22 Thread Gyula Fóra
hould eliminate most of these cases. Cheers, Gyula On Tue, Nov 22, 2022 at 9:43 AM Dongwon Kim wrote: > Hi, > > While using a last-state upgrade mode on flink-k8s-operator-1.2.0 and > flink-1.14.3, we're occasionally facing the following error: > > Status: >> Clu

[Flink K8s operator] HA metadata not available to restore from last state

2022-11-22 Thread Dongwon Kim
Hi, While using a last-state upgrade mode on flink-k8s-operator-1.2.0 and flink-1.14.3, we're occasionally facing the following error: Status: > Cluster Info: > Flink - Revision: 98997ea @ 2022-01-08T23:23:54+01:00 > Flink - Version: 1.

Re: status no clear when deploying batch job with flink-k8s-operator

2022-10-25 Thread Gyula Fóra
eploying a flink batch job with flink-k8s-operator. > My flink-k8s-operator's version is 1.2.0 and flink's version is 1.14.6. I > found after the batch job execute finish, the jobManagerDeploymentStatus > field became "MISSING" in FlinkDeployment crd. And the err

status no clear when deploying batch job with flink-k8s-operator

2022-10-25 Thread Liting Liu (litiliu)
  Hi, I'm deploying a flink batch job with flink-k8s-operator. My flink-k8s-operator's version is 1.2.0 and flink's version is 1.14.6. I found after the batch job execute finish, the jobManagerDeploymentStatus field became "MISSING" in FlinkDeployment crd. A

Re: fail to mount hadoop-config-volume when using flink-k8s-operator

2022-10-13 Thread Yang Wang
adoop conf directory exists in the image. For flink-k8s-operator, another feasible solution is to create a hadoop-config-configmap manually and then use *"kubernetes.hadoop.conf.config-map.name <http://kubernetes.hadoop.conf.config-map.name>" *to mount it to JobManager and TaskManager

fail to mount hadoop-config-volume when using flink-k8s-operator

2022-10-12 Thread Liting Liu (litiliu)
Hi, community: I'm using flink-k8s-operator v1.2.0 to deploy flink job. And the "HADOOP_CONF_DIR" environment variable was setted in the image that i buiilded from flink:1.15. I found the taskmanager pod was trying to mount a volume named "hadoop-config-volume"

Re: Flink k8s Operator on AWS?

2022-06-27 Thread Matt Casters
. I chalk all that up to just me lacking a bit of experience with k8s. That being said... It's all working now and I documented the deployment over here: https://hop.apache.org/manual/next/pipeline/beam/flink-k8s-operator-running-hop-pipeline.html A big thank you to everyone that helped m

Re: Flink k8s Operator on AWS?

2022-06-26 Thread Yang Wang
Could you please share the JobManager logs of failed deployment? It will also help a lot if you could show the pending pod status via "kubectl describe ". Given that the current Flink Kubernetes Operator is built on top of native K8s integration[1], the Flink ResourceManager should allocate enough

Re: Flink k8s Operator on AWS?

2022-06-24 Thread Matt Casters
Yes of-course. I already feel a bit less intelligent for having asked the question ;-) The status now is that I managed to have it all puzzled together. Copying the files from s3 to an ephemeral volume takes all of 2 seconds so it's really not an issue. The cluster starts and our fat jar and Ap

Re: Flink k8s Operator on AWS?

2022-06-24 Thread Őrhidi Mátyás
Hi Matt, Yes. There are several official Flink images with various JVMs including Java 11. https://hub.docker.com/_/flink Cheers, Matyas On Fri, Jun 24, 2022 at 2:06 PM Matt Casters wrote: > Hi Mátyás & all, > > Thanks again for the advice so far. On a related note I noticed Java 8 > being us

Re: Flink k8s Operator on AWS?

2022-06-24 Thread Matt Casters
Hi Mátyás & all, Thanks again for the advice so far. On a related note I noticed Java 8 being used, indicated in the log. org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - JAVA_HOME: /usr/local/openjdk-8 Is there a way to use Java 11 to start Flink with? Kind regards, Matt On

Re: Flink k8s Operator on AWS?

2022-06-23 Thread Yang Wang
Thanks for your valuable inputs. To make deploying Flink on K8s easy as a normal Java application is certainly the mission of Flink Kubernetes Operator. Obviously, we are still a little far from this mission. Back to the user jars download, I think it makes sense to introduce the artifact fetcher

Re: Flink k8s Operator on AWS?

2022-06-22 Thread Matt Casters
Hi Yang, Thanks for the suggestion! I looked into this volume sharing on EKS yesterday but I couldn't figure it out right away. The way that people come into the Apache Hop project is often with very little technical knowledge since that's sort of the goal of the project: make things easy. Follo

Re: Flink k8s Operator on AWS?

2022-06-22 Thread Matt Casters
Hi Matyas, Again thank you very much for the information. I'm a beginner and all the help is really appreciated. After some diving into the script behind s3-artifiact-fetcher I kind of figured it out. Have an folder sync'ed into the pod container of the task manager. Then I guess we should be

Re: Flink k8s Operator on AWS?

2022-06-21 Thread Yang Wang
Matyas and Gyula have shared many great informations about how to make the Flink Kubernetes Operator work on the EKS. One more input about how to prepare the user jars. If you are more familiar with K8s, you could use persistent volume to provide the user jars and them mount the volume to JobManag

Re: Flink k8s Operator on AWS?

2022-06-21 Thread Őrhidi Mátyás
Hi Matt, I believe an artifact fetcher (e.g https://hub.docker.com/r/agiledigital/s3-artifact-fetcher ) + the pod template ( https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/pod-template/#pod-template) is an elegant way to solve your problem. The operato

Re: Flink k8s Operator on AWS?

2022-06-21 Thread Matt Casters
Thank you very much for the help Matyas and Gyula! I just saw a video today where you were presenting the FKO. Really nice stuff! So I'm guessing we're executing "flink run" at some point on the master and that this is when we need the jar file to be local? Am I right in assuming that this happe

Re: Flink k8s Operator on AWS?

2022-06-21 Thread Gyula Fóra
A small addition to what Matyas has said: The limitation of only supporting local scheme is coming from the Flink Kubernetes Application mode directly and is not related to the operator itself. Once this feature is added to Flink itself the operator can also support it for newer Flink versions. G

Re: Flink k8s Operator on AWS?

2022-06-21 Thread Őrhidi Mátyás
Hi Matt, - In FlinkDeployments you can utilize an init container to download your artifact onto a shared volume, then you can refer to it as local:/.. from the main container. FlinkDeployments comes with pod template support https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/do

Flink k8s Operator on AWS?

2022-06-21 Thread Matt Casters
Hi Flink team! I'm interested in getting the new Flink Kubernetes Operator to work on AWS EKS. Following the documentation I got pretty far. However, when trying to run a job I got the following error: Only "local" is supported as schema for application mode. This assumes t > hat the jar is loc

Re: multiple pipeline deployment using flink k8s operator

2022-06-01 Thread Yang Wang
Best, Yang Sigalit Eliazov 于2022年6月1日周三 14:54写道: > Hi all, > we just started using the flink k8s operator to deploy our flink cluster. > From what we understand we are only able to start a flink cluster per job. > So in our case when we have 2 jobs we have to create 2 different clusters

multiple pipeline deployment using flink k8s operator

2022-05-31 Thread Sigalit Eliazov
Hi all, we just started using the flink k8s operator to deploy our flink cluster. >From what we understand we are only able to start a flink cluster per job. So in our case when we have 2 jobs we have to create 2 different clusters. obviously we would prefer to deploy these 2 job which relate