These were the parameters that I set btw: akka.watch.heartbeat.interval: 100 akka.transport.heartbeat.interval: 1000
On Fri, Feb 19, 2016 at 7:43 PM, Saiph Kappa <saiph.ka...@gmail.com> wrote: > I am not sure. > > For that particular machine I get messages like these: > « > myip:6123/user/jobmanager#291801197])) at akka://flink/user/$a from > Actor[akka://flink/deadLetters]. > ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor - Connected to new > JobManager akka.tcp://flink@myip:6123/user/jobmanager. > > ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor - Sending message to > JobManager akka.tcp://flink@myip:6123/user/jobmanager to submit job JOB1 > (5f9cef0c2e4b69530bf1e2485e94d326) and wait for progress > > > ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Handled message > LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip:6123/user/jobmanager#291801197])) > in 48 ms from Actor[akka://flink/deadLetters]. > > > ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Handled message > LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip:6123/user/jobmanager#291801197])) > in 48 ms from Actor[akka://flink/deadLetters]. > > ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Received message > JobSubmitSuccess(2575d5ff5c10336beb7820a052a63623) at akka://flink/user/$a > from Actor[akka.tcp://flink@myip:6123/user/jobmanager#1144818256]. > » > > I tried to set the heartbeat interval in the cluster but it didn't solve > the problem, should I try to set it in the client (how can I do it)? I see > no other errors or exceptions on the log files. > > > > > On Fri, Feb 19, 2016 at 7:07 PM, Robert Metzger <rmetz...@apache.org> > wrote: > >> Hi Saiph, >> >> are you sure that the jobs are cancelled because the client disconnects? >> >> For the different timeouts, check the configuration page: >> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html >> and search for "heartbeat". >> >> On Fri, Feb 19, 2016 at 8:04 PM, Saiph Kappa <saiph.ka...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I have a Flink client application that launches jobs to remote clusters. >>> However I'm getting my jobs cancelled: >>> "18:25:29,650 WARN >>> akka.remote.ReliableDeliverySupervisor - Association >>> with remote system [akka.tcp://flink@127.0.0.1:52929] has failed, >>> address is now gated for [5000] ms. Reason is: [Disassociated]." >>> >>> How can I increase the akka heartbeat interval? Where should I set that >>> configuration parameter, in the client or in the Flink clusters, and in >>> which file. >>> >>> Thanks. >>> >>> >> >