Hi, thanks for the quick response, The pipelines we are running are: One pipeline doing simple conversions from read kafak and send to kafka Two pipelines that are working with DB (one write and one read) Two pipelines that are working with BigQuery.IO (both writing data)
The ones we usually have problems with are the BigQuery pipelines, it seems that when they are running, we can’t deploy more than 4 pipelines, when we tried to run in a different set of pipelines (for example only the DB with 6 pipelines doing similar things) there was no issue, but when we try the same with bigQuery, we are not able to deploy more then 4. We tried to increase the resources, and also number of task managers and slots, but didn’t see any change. In the jobmanager logs after several minutes we saw this exception: java.util.concurrent.TimeoutException: Invocation of public abstract java.util.concurrent.CompletableFuture org.apache.flink.runtime.taskexecutor.TaskExecutorGateway.requestSlot(org.apache.flink.runtime.clusterframework.types.SlotID,org.apache.flink.api.common.JobID,org.apache.flink.runtime.clusterframework.types.AllocationID,org.apache.flink.runtime.clusterframework.types.ResourceProfile,java.lang.String,org.apache.flink.runtime.resourcemanager.ResourceManagerId,org.apache.flink.api.common.time.Time) timed out. Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink@10.1.8.70:6122/user/rpc/taskmanager_0#421157124]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply. But I am not entirely sure this is related, as it was printed a while after the job was supposed to start ( a lot more then 10000ms) thanks From: Caizhi Weng <tsreape...@gmail.com> Date: Monday, 15 November 2021 at 3:42 To: Sigalit Eliazov <e.siga...@gmail.com> Cc: user <user@flink.apache.org> Subject: Re: pipeline are not started sporadically Hi! What state is the not running job in? Is that a random job or a specific job, or a specific type of job? You can also look into task managers for exceptions and for anything suspicious in log. For example it might be possible that number of connections in JDBC exceeds the limit, but this is just a guess under current given information. Sigalit Eliazov <e.siga...@gmail.com<mailto:e.siga...@gmail.com>> 于2021年11月14日周日 下午10:18写道: Hello We have 5 different pipelines running on standalone flink cluster. 2 �C integrated with DB using JDBC 2 �C integrated with GCP �C writes to big query 1 �C reads from kafka writes to kafka We are running with 1 job manager 5 task managers �C 2 slots on each Our problem is that only 4 out of the 5 pipelines are starting There is no errors or exceptions in the job manager so it is not clear why Also it seems to be sporadic. Each time we do restart one of the pipelines is not running We tried to increase number of task manager and even change memory setting. Did u encountered such issue? The most disturbing thing is that we don’t see any exception…. Thanks S