Hello,
I upgraded from Flink 1.7.2 to 1.10.2. One of the jobs running on the task
managers is periodically crashing w/ the following error:
java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error
has occurred. This can mean two things: either the job requires a larger
size of JV
starts would load new
>> classes, then expand the metaspace, and finally OOM happens.
>>
>>
>>
>> ## Leader retrieving
>>
>> Constant restarts may be heavy for jobmanager, if JM CPU resources are
>> not enough, the thread for leader retrieving may be stuck.
t;* occupy *6,615,416 (18.76%)*bytes.
Based on this, I'm not clear on what needs to be done to solve this.
On Tue, Sep 22, 2020 at 3:10 PM Claude M wrote:
> Thanks for your responses.
> 1. There were no job re-starts prior to the metaspace OEM.
> 2. I tried increasing the CPU req
It was mentioned that this issue may be fixed in 1.10.3 but there is no
1.10.3 docker image here: https://hub.docker.com/_/flink
On Wed, Sep 23, 2020 at 7:14 AM Claude M wrote:
> In regards to the metaspace memory issue, I was able to get a heap dump
> and the following is the
>
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/flinkDev/building.html
>
>
>
> On Wed, Sep 23, 2020 at 8:29 PM Claude M wrote:
>
>> It was mentioned that this issue may be fixed in 1.10.3 but there is no
>> 1.10.3 docker image here: https:/
Hello,
I have Flink 1.10.2 installed in a Kubernetes cluster.
Anytime I make a change to the flink.conf, the Flink jobmanager pod fails
to restart.
For example, I modified the following memory setting in the flink.conf:
jobmanager.memory.flink.size.
After I deploy the change, the pod fails to rest
dual pods with a changed
> memory configuration. Can you share the full Jobmanager log of the failed
> restart attempt?
>
> I don't think that the log statement you've posted explains a start
> failure.
>
> Regards,
> Robert
>
> On Tue, Nov 3, 2020 at 2:33 AM C
Manager?
>
>
>
> On Tue, Nov 3, 2020 at 7:06 PM Claude M wrote:
>
>> Thanks for your reply Robert. Please see attached log from the job
>> manager, the last line is the only thing I see different from a pod that
>> starts up successfully.
>>
>> On Tue,
Hello,
I have a Flink jobmanager and taskmanagers deployed in a Kubernetes
cluster. I integrated it with Datadog by having the following specified in
the flink-conf.yaml.
metrics.reporter.dghttp.class:
org.apache.flink.metrics.datadog.DatadogHttpReporter
metrics.reporter.dghttp.apikey:
However
Hello,
I created a simple Producer and when the job ran, it was getting the
following error:
Caused by: org.apache.kafka.common.errors.TimeoutException
I read about increasing the request.timeout.ms. Thus, I added the
following properties.
Properties properties = new Properties();
properties.s
ike
> you didn’t set it
>
> ------
> *From:* Claude M
> *Sent:* Friday, February 26, 2021 12:02:10 PM
> *To:* user
> *Subject:* Producer Configuration
>
> Hello,
>
> I created a simple Producer and when the jo
renikhun wrote:
> Can you produce messages using Kafka console producer connect using same
> properties ?
>
> ------
> *From:* Claude M
> *Sent:* Saturday, February 27, 2021 8:05 AM
> *To:* Alexey Trenikhun
> *Cc:* user
> *Subject:* Re: Producer Co
Hello,
I'm trying to run an experiment w/ two flink jobs:
- A producer producing messages to hundreds of topics
- A consumer consuming the messages from all the topics
After the job runs after a few minutes, it will fail w/ following error:
Caused by: org.apache.kafka.common.errors.Timeou
Hello,
I'm trying to setup Flink in Kubernetes using the Application Mode as
described here:
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes
The doc mentions that there needs to be a aervice exposing the JobManager’s
REST and UI port
-jobmanager"?
> You could set "jobmanager.rpc.address" to flink-jobmanager in the
> ConfigMap.
>
> Best,
> Yang
>
> Guowei Ma 于2021年3月24日周三 上午10:22写道:
>
>> Hi, M
>> Could you give the full stack? This might not be the root cause.
>> Best
Hello,
I executed a flink job in a Kubernetes Application cluster w/ four
taskmanagers. The job was running fine for several hours but then crashed
w/ the following exception which seems to be when restoring from a
checkpoint.The UI shows the following for the checkpoint counts:
Triggered: 6
Hello,
I have Flink setup as an Application Cluster in Kubernetes, using Flink
version 1.12. I created a savepoint using the curl command and the status
indicated it was completed. I then tried to relaunch the job from that
save point using the following arguments as indicated in the doc found
h
that the snapshot was created by checking the actual folder?
>
> Best,
> Matthias
>
> On Wed, Mar 31, 2021 at 4:56 AM Claude M wrote:
>
>> Hello,
>>
>> I have Flink setup as an Application Cluster in Kubernetes, using Flink
>> version 1.12. I created a savepoint
Hello,
I've setup Flink as an Application Cluster in Kubernetes. Now I'm looking
into monitoring the Flink cluster in Datadog. This is what is configured
in the flink-conf.yaml to emit metrics:
metrics.scope.jm: flink.jobmanager
metrics.scope.jm.job: flink.jobmanager.job
metrics.scope.tm: flink
Hello,
The documentation here
https://ci.apache.org/projects/flink/flink-docs-stable/ops/metrics.html
states there is a isBackPressured metric available yet I don't see it. Any
ideas why?
Thanks
gt;
> The metric is registered upon task deployment and reported periodically.
>
> Which Flink version are you using? The metric was added in 1.10.
> Are you checking it in the UI?
>
> Regards,
> Roman
>
> On Fri, Apr 9, 2021 at 8:50 PM Claude M wrote:
> >
21 matches
Mail list logo