Re: Could not resolve ResourceManager address

2022-09-21 Thread Rommel Holmes
resolved it. if it is non-HA, don't pass in POD_IP. the error msg is not clear. On Wed, Sep 21, 2022 at 9:12 AM Rommel Holmes wrote: > > Hello > > I am trying to make a Flink application deployment in k8s, but the error > message shows that the task manager can't

Could not resolve ResourceManager address

2022-09-21 Thread Rommel Holmes
Hello I am trying to make a Flink application deployment in k8s, but the error message shows that the task manager can't resolve resource manager address *Could not resolve ResourceManager address akka.tcp://flink@flink-jm-svc-streaming-job:6123/user/rpc/resourcemanager_*, retrying in 1 ms:

Flink + K8s

2021-11-02 Thread Rommel Holmes
Hi, >From my understanding, when i set Flink in HA mode in K8s, I don't need to setup more than 1 job manager, because once the job manager dies, K8s will restart it for me. Is that the correct understanding or for the HA purpose, I still need to setup more than 1 job manager? Thanks. Rommel --

Re: Flink S3 Presto Checkpointing Permission Forbidden

2021-10-08 Thread Rommel Holmes
You already have s3 request ID, you can easily reach out to AWS tech support to know what account was used to write to S3. I guess that account probably doesn't have permission to do the following: "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket" Then grant the account with the

Re: Resource when using HA setup on K8s with Flink 1.13.2

2021-10-01 Thread Rommel Holmes
nvm, i setup HA wrong. Now it is running with only 112 task managers with the correct setup. Thanks. On Fri, Oct 1, 2021 at 11:46 AM Rommel Holmes wrote: > Hi, > > I had one Flink job running on K8s with 1 job manager and 112 task > managers with slot number 1. The job can run

Resource when using HA setup on K8s with Flink 1.13.2

2021-10-01 Thread Rommel Holmes
Hi, I had one Flink job running on K8s with 1 job manager and 112 task managers with slot number 1. The job can run without any issue. After I setup HA on the job with 2 job managers, I noticed that the

Re: Potential bug when assuming roles from AWS EKS when using S3 as RocksDb checkpoint backend?

2021-09-27 Thread Rommel Holmes
Hi, Ingo I was looking into the aws dependeencies, and from https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html the minimum required version to use the feature is 1.11.704. So 1.11.788 should be sufficient? Can you point it to me where it says that 1.1

PoJo to Avro Serialization throw KryoException: java.lang.UnsupportedOperationException

2021-06-22 Thread Rommel Holmes
My Unit test was running OK under Flink 1.11.2 with parquet-avro 1.10.0, once I upgrade to 1.12.0 with parquet-avro 1.12.0, my unit test will throw com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap

Re: NoSuchMethodError - getColumnIndexTruncateLength after upgrading Flink from 1.11.2 to 1.12.1

2021-06-22 Thread Rommel Holmes
To give more information parquet-avro version 1.10.0 with Flink 1.11.2 and it was running fine. now Flink 1.12.1, the error msg shows up. Thank you for help. Rommel On Tue, Jun 22, 2021 at 2:41 PM Thomas Wang wrote: > Hi, > > We recently upgraded our Flink version from 1.11.2 to 1.12.1 a

Re: Resource Planning

2021-06-16 Thread Rommel Holmes
Hi, Xintong and Robert Thanks for the reply. The checkpoint size for our job is 10-20GB since we are doing incremental checkpointing, if we do a savepoint, it can be as big as 150GB. 1) We will try to make Flink instance bigger. 2) Thanks for the pointer, we will take a look. 3) We do have CPU