Hi Saiph!

What is the problem that is happening? The log actually looks like the Job
is successfully sent to the JobManager.

Stephan



On Fri, Feb 19, 2016 at 8:49 PM, Robert Metzger <rmetz...@apache.org> wrote:

> Hi,
> can you maybe (if you want also private) send me the full logs of the
> jobmanager? The messages you've posted here are logged at DEBUG level. They
> don't indicate an erroneous behavior of the system.
>
> On Fri, Feb 19, 2016 at 8:44 PM, Saiph Kappa <saiph.ka...@gmail.com>
> wrote:
>
>> These were the parameters that I set btw:
>>
>> akka.watch.heartbeat.interval: 100
>> akka.transport.heartbeat.interval: 1000
>>
>> On Fri, Feb 19, 2016 at 7:43 PM, Saiph Kappa <saiph.ka...@gmail.com>
>> wrote:
>>
>>> I am not sure.
>>>
>>> For that particular machine I get messages like these:
>>> «
>>> myip:6123/user/jobmanager#291801197])) at akka://flink/user/$a from
>>> Actor[akka://flink/deadLetters].
>>> ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor    - Connected to new
>>> JobManager akka.tcp://flink@myip:6123/user/jobmanager.
>>>
>>> ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor    - Sending message to
>>> JobManager akka.tcp://flink@myip:6123/user/jobmanager to submit job
>>> JOB1 (5f9cef0c2e4b69530bf1e2485e94d326) and wait for progress
>>>
>>>
>>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor    - Handled message
>>> LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip:6123/user/jobmanager#291801197]))
>>> in 48 ms from Actor[akka://flink/deadLetters].
>>>
>>>
>>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor    - Handled message
>>> LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip:6123/user/jobmanager#291801197]))
>>> in 48 ms from Actor[akka://flink/deadLetters].
>>>
>>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor    - Received message
>>> JobSubmitSuccess(2575d5ff5c10336beb7820a052a63623) at akka://flink/user/$a
>>> from Actor[akka.tcp://flink@myip:6123/user/jobmanager#1144818256].
>>> »
>>>
>>> I tried to set the heartbeat interval in the cluster but it didn't solve
>>> the problem, should I try to set it in the client (how can I do it)? I see
>>> no other errors or exceptions on the log files.
>>>
>>>
>>>
>>>
>>> On Fri, Feb 19, 2016 at 7:07 PM, Robert Metzger <rmetz...@apache.org>
>>> wrote:
>>>
>>>> Hi Saiph,
>>>>
>>>> are you sure that the jobs are cancelled because the client disconnects?
>>>>
>>>> For the different timeouts, check the configuration page:
>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html
>>>> and search for "heartbeat".
>>>>
>>>> On Fri, Feb 19, 2016 at 8:04 PM, Saiph Kappa <saiph.ka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a Flink client application that launches jobs to remote
>>>>> clusters. However I'm getting my jobs cancelled:
>>>>> "18:25:29,650 WARN
>>>>> akka.remote.ReliableDeliverySupervisor                        - 
>>>>> Association
>>>>> with remote system [akka.tcp://flink@127.0.0.1:52929] has failed,
>>>>> address is now gated for [5000] ms. Reason is: [Disassociated]."
>>>>>
>>>>> How can I increase the akka heartbeat interval? Where should I set
>>>>> that configuration parameter, in the client or in the Flink clusters, and
>>>>> in which file.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to