Re: Testing on Flink 1.5

2018-04-24 Thread Gary Yao
Hi Amit, web.timeout should only affect RPC calls originating from the REST API. In FLIP-6, the submission of the job graph happens via HTTP. The value under akka.ask.timeout is still used as the default timeout for RPC calls [1][2]. Since you also had custom heartbeats settings, you should consid

Re: Testing on Flink 1.5

2018-04-20 Thread Amit Jain
Hi Gary, This setting has resolved the issue. Does it increase timeout for all the RPC or specific components? We had following settings in Flink 1.3.2 and they did the job for us. akka.watch.heartbeat.pause: 600 s akka.client.timeout: 5 min akka.ask.timeout: 120 s -- Thanks, Amit

Re: Testing on Flink 1.5

2018-04-19 Thread Gary Yao
Hi Amit, Thank you for the follow up. What you describe sounds like a bug but I am not able to reproduce it. Can you open an issue in Jira with an outline of your code and how you submit the job? > Could you also recommend us the best practice in FLIP6, should we use YARN session or submit jobs i

Re: Testing on Flink 1.5

2018-04-19 Thread Amit Jain
Hi Gary, We found the underlying issue with the following problem. Few of our jobs are stuck with logs [1], these jobs are only able to allocate JM and couldn't get any TM, however, there are ample resource on our cluster. We are running ETL merge job here. In this job, we first find new deltas a