Hi, thanks for the quick response,
The pipelines we are running are:
One pipeline doing simple conversions from read kafak and send to kafka
Two pipelines that are working with DB (one write and one read)
Two pipelines that are working with BigQuery.IO (both writing data)

The ones we usually have problems with are the BigQuery pipelines, it seems 
that when they are running, we can’t deploy more than 4 pipelines, when we 
tried to run in a different set of pipelines (for example only the DB with 6 
pipelines doing similar things) there was no issue, but when we try the same 
with bigQuery, we are not able to deploy more then 4.
We tried to increase the resources, and also number of task managers and slots, 
but didn’t see any change.

In the jobmanager logs after several minutes we saw this exception:
java.util.concurrent.TimeoutException: Invocation of public abstract 
java.util.concurrent.CompletableFuture 
org.apache.flink.runtime.taskexecutor.TaskExecutorGateway.requestSlot(org.apache.flink.runtime.clusterframework.types.SlotID,org.apache.flink.api.common.JobID,org.apache.flink.runtime.clusterframework.types.AllocationID,org.apache.flink.runtime.clusterframework.types.ResourceProfile,java.lang.String,org.apache.flink.runtime.resourcemanager.ResourceManagerId,org.apache.flink.api.common.time.Time)
 timed out.
Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka.tcp://flink@10.1.8.70:6122/user/rpc/taskmanager_0#421157124]] after 
[10000 ms]. Message of type 
[org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation]. A typical reason 
for `AskTimeoutException` is that the recipient actor didn't send a reply.

But I am not entirely sure this is related, as it was printed a while after the 
job was supposed to start ( a lot more then 10000ms)

thanks

From: Caizhi Weng <tsreape...@gmail.com>
Date: Monday, 15 November 2021 at 3:42
To: Sigalit Eliazov <e.siga...@gmail.com>
Cc: user <user@flink.apache.org>
Subject: Re: pipeline are not started sporadically
Hi!

What state is the not running job in? Is that a random job or a specific job, 
or a specific type of job?

You can also look into task managers for exceptions and for anything suspicious 
in log. For example it might be possible that number of connections in JDBC 
exceeds the limit, but this is just a guess under current given information.

Sigalit Eliazov <e.siga...@gmail.com<mailto:e.siga...@gmail.com>> 
于2021年11月14日周日 下午10:18写道:
Hello
We have 5 different pipelines running on standalone flink cluster.
2 �C integrated with DB using JDBC
2 �C integrated with GCP �C writes to big query
1 �C reads from kafka writes to kafka

We are running with
1 job manager
5 task managers �C 2 slots on each


Our problem is that only 4 out of the 5 pipelines are starting
There is no errors or exceptions in the job manager so it is not clear why
Also it seems to be sporadic. Each time we do restart one of the pipelines is 
not running
We tried to increase number of task manager and even change memory setting.

Did u encountered such issue?  The most disturbing thing is that we don’t see 
any exception….


Thanks
S

Reply via email to