Hi Saiph! What is the problem that is happening? The log actually looks like the Job is successfully sent to the JobManager.
Stephan On Fri, Feb 19, 2016 at 8:49 PM, Robert Metzger <rmetz...@apache.org> wrote: > Hi, > can you maybe (if you want also private) send me the full logs of the > jobmanager? The messages you've posted here are logged at DEBUG level. They > don't indicate an erroneous behavior of the system. > > On Fri, Feb 19, 2016 at 8:44 PM, Saiph Kappa <saiph.ka...@gmail.com> > wrote: > >> These were the parameters that I set btw: >> >> akka.watch.heartbeat.interval: 100 >> akka.transport.heartbeat.interval: 1000 >> >> On Fri, Feb 19, 2016 at 7:43 PM, Saiph Kappa <saiph.ka...@gmail.com> >> wrote: >> >>> I am not sure. >>> >>> For that particular machine I get messages like these: >>> « >>> myip:6123/user/jobmanager#291801197])) at akka://flink/user/$a from >>> Actor[akka://flink/deadLetters]. >>> ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor - Connected to new >>> JobManager akka.tcp://flink@myip:6123/user/jobmanager. >>> >>> ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor - Sending message to >>> JobManager akka.tcp://flink@myip:6123/user/jobmanager to submit job >>> JOB1 (5f9cef0c2e4b69530bf1e2485e94d326) and wait for progress >>> >>> >>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Handled message >>> LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip:6123/user/jobmanager#291801197])) >>> in 48 ms from Actor[akka://flink/deadLetters]. >>> >>> >>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Handled message >>> LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip:6123/user/jobmanager#291801197])) >>> in 48 ms from Actor[akka://flink/deadLetters]. >>> >>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Received message >>> JobSubmitSuccess(2575d5ff5c10336beb7820a052a63623) at akka://flink/user/$a >>> from Actor[akka.tcp://flink@myip:6123/user/jobmanager#1144818256]. >>> » >>> >>> I tried to set the heartbeat interval in the cluster but it didn't solve >>> the problem, should I try to set it in the client (how can I do it)? I see >>> no other errors or exceptions on the log files. >>> >>> >>> >>> >>> On Fri, Feb 19, 2016 at 7:07 PM, Robert Metzger <rmetz...@apache.org> >>> wrote: >>> >>>> Hi Saiph, >>>> >>>> are you sure that the jobs are cancelled because the client disconnects? >>>> >>>> For the different timeouts, check the configuration page: >>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html >>>> and search for "heartbeat". >>>> >>>> On Fri, Feb 19, 2016 at 8:04 PM, Saiph Kappa <saiph.ka...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have a Flink client application that launches jobs to remote >>>>> clusters. However I'm getting my jobs cancelled: >>>>> "18:25:29,650 WARN >>>>> akka.remote.ReliableDeliverySupervisor - >>>>> Association >>>>> with remote system [akka.tcp://flink@127.0.0.1:52929] has failed, >>>>> address is now gated for [5000] ms. Reason is: [Disassociated]." >>>>> >>>>> How can I increase the akka heartbeat interval? Where should I set >>>>> that configuration parameter, in the client or in the Flink clusters, and >>>>> in which file. >>>>> >>>>> Thanks. >>>>> >>>>> >>>> >>> >> >