Re: Flink Metric isBackPressured not available

2021-04-13 Thread Claude M
gt; > The metric is registered upon task deployment and reported periodically. > > Which Flink version are you using? The metric was added in 1.10. > Are you checking it in the UI? > > Regards, > Roman > > On Fri, Apr 9, 2021 at 8:50 PM Claude M wrote: > >

Flink Metric isBackPressured not available

2021-04-09 Thread Claude M
Hello, The documentation here https://ci.apache.org/projects/flink/flink-docs-stable/ops/metrics.html states there is a isBackPressured metric available yet I don't see it. Any ideas why? Thanks

Flink Metrics emitted from a Kubernetes Application Cluster

2021-04-08 Thread Claude M
Hello, I've setup Flink as an Application Cluster in Kubernetes. Now I'm looking into monitoring the Flink cluster in Datadog. This is what is configured in the flink-conf.yaml to emit metrics: metrics.scope.jm: flink.jobmanager metrics.scope.jm.job: flink.jobmanager.job metrics.scope.tm: flink

Re: Restoring from Flink Savepoint in Kubernetes not working

2021-03-31 Thread Claude M
that the snapshot was created by checking the actual folder? > > Best, > Matthias > > On Wed, Mar 31, 2021 at 4:56 AM Claude M wrote: > >> Hello, >> >> I have Flink setup as an Application Cluster in Kubernetes, using Flink >> version 1.12. I created a savepoint

Restoring from Flink Savepoint in Kubernetes not working

2021-03-30 Thread Claude M
Hello, I have Flink setup as an Application Cluster in Kubernetes, using Flink version 1.12. I created a savepoint using the curl command and the status indicated it was completed. I then tried to relaunch the job from that save point using the following arguments as indicated in the doc found h

Flink failing to restore from checkpoint

2021-03-29 Thread Claude M
Hello, I executed a flink job in a Kubernetes Application cluster w/ four taskmanagers. The job was running fine for several hours but then crashed w/ the following exception which seems to be when restoring from a checkpoint.The UI shows the following for the checkpoint counts: Triggered: 6

Re: Kubernetes Application Cluster Not Working

2021-03-29 Thread Claude M
-jobmanager"? > You could set "jobmanager.rpc.address" to flink-jobmanager in the > ConfigMap. > > Best, > Yang > > Guowei Ma 于2021年3月24日周三 上午10:22写道: > >> Hi, M >> Could you give the full stack? This might not be the root cause. >> Best

Kubernetes Application Cluster Not Working

2021-03-23 Thread Claude M
Hello, I'm trying to setup Flink in Kubernetes using the Application Mode as described here: https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes The doc mentions that there needs to be a aervice exposing the JobManager’s REST and UI port

Timeout Exception When Producing/Consuming Messages to Hundreds of Topics

2021-03-01 Thread Claude M
Hello, I'm trying to run an experiment w/ two flink jobs: - A producer producing messages to hundreds of topics - A consumer consuming the messages from all the topics After the job runs after a few minutes, it will fail w/ following error: Caused by: org.apache.kafka.common.errors.Timeou

Re: Producer Configuration

2021-02-27 Thread Claude M
renikhun wrote: > Can you produce messages using Kafka console producer connect using same > properties ? > > ------ > *From:* Claude M > *Sent:* Saturday, February 27, 2021 8:05 AM > *To:* Alexey Trenikhun > *Cc:* user > *Subject:* Re: Producer Co

Re: Producer Configuration

2021-02-27 Thread Claude M
ike > you didn’t set it > > ------ > *From:* Claude M > *Sent:* Friday, February 26, 2021 12:02:10 PM > *To:* user > *Subject:* Producer Configuration > > Hello, > > I created a simple Producer and when the jo

Producer Configuration

2021-02-26 Thread Claude M
Hello, I created a simple Producer and when the job ran, it was getting the following error: Caused by: org.apache.kafka.common.errors.TimeoutException I read about increasing the request.timeout.ms. Thus, I added the following properties. Properties properties = new Properties(); properties.s

Flink Datadog Timeout

2021-02-02 Thread Claude M
Hello, I have a Flink jobmanager and taskmanagers deployed in a Kubernetes cluster. I integrated it with Datadog by having the following specified in the flink-conf.yaml. metrics.reporter.dghttp.class: org.apache.flink.metrics.datadog.DatadogHttpReporter metrics.reporter.dghttp.apikey: However

Re: Error while retrieving the leader gateway after making Flink config changes

2020-11-04 Thread Claude M
Manager? > > > > On Tue, Nov 3, 2020 at 7:06 PM Claude M wrote: > >> Thanks for your reply Robert. Please see attached log from the job >> manager, the last line is the only thing I see different from a pod that >> starts up successfully. >> >> On Tue,

Re: Error while retrieving the leader gateway after making Flink config changes

2020-11-03 Thread Claude M
dual pods with a changed > memory configuration. Can you share the full Jobmanager log of the failed > restart attempt? > > I don't think that the log statement you've posted explains a start > failure. > > Regards, > Robert > > On Tue, Nov 3, 2020 at 2:33 AM C

Error while retrieving the leader gateway after making Flink config changes

2020-11-02 Thread Claude M
Hello, I have Flink 1.10.2 installed in a Kubernetes cluster. Anytime I make a change to the flink.conf, the Flink jobmanager pod fails to restart. For example, I modified the following memory setting in the flink.conf: jobmanager.memory.flink.size. After I deploy the change, the pod fails to rest

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-24 Thread Claude M
> > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/flinkDev/building.html > > > > On Wed, Sep 23, 2020 at 8:29 PM Claude M wrote: > >> It was mentioned that this issue may be fixed in 1.10.3 but there is no >> 1.10.3 docker image here: https:/

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-23 Thread Claude M
It was mentioned that this issue may be fixed in 1.10.3 but there is no 1.10.3 docker image here: https://hub.docker.com/_/flink On Wed, Sep 23, 2020 at 7:14 AM Claude M wrote: > In regards to the metaspace memory issue, I was able to get a heap dump > and the following is the

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-23 Thread Claude M
t;* occupy *6,615,416 (18.76%)*bytes. Based on this, I'm not clear on what needs to be done to solve this. On Tue, Sep 22, 2020 at 3:10 PM Claude M wrote: > Thanks for your responses. > 1. There were no job re-starts prior to the metaspace OEM. > 2. I tried increasing the CPU req

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-22 Thread Claude M
starts would load new >> classes, then expand the metaspace, and finally OOM happens. >> >> >> >> ## Leader retrieving >> >> Constant restarts may be heavy for jobmanager, if JM CPU resources are >> not enough, the thread for leader retrieving may be stuck.

metaspace out-of-memory & error while retrieving the leader gateway

2020-09-18 Thread Claude M
Hello, I upgraded from Flink 1.7.2 to 1.10.2. One of the jobs running on the task managers is periodically crashing w/ the following error: java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JV