Re: TM occasionally hang in deploying state in Flink 1.5

2019-05-07 Thread qi luo
Thanks Dawid, I’ve created an issue for this https://jira.apache.org/jira/browse/FLINK-12426 . Though we’re using 1.5 but this may affect later versions. I’m still investigating the root case but no result yet. This happens occasionally and isn'

Re: TM occasionally hang in deploying state in Flink 1.5

2019-04-25 Thread Dawid Wysakowicz
Hi, Feel free to open a JIRA for this issue. By the way have you investigated what is the root cause for it hanging? Best, Dawid On 25/04/2019 08:55, qi luo wrote: > Hello, > > This issue occurred again and we dumped the TM thread. It indeed hung > on socket read to download jar from Blob serve

Re: TM occasionally hang in deploying state in Flink 1.5

2019-04-25 Thread qi luo
Hello, This issue occurred again and we dumped the TM thread. It indeed hung on socket read to download jar from Blob server: "DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) (1999/2000)" #72 prio=5 os_prio=0 tid=0x7fb9a1521000 nid=0xa0994 runnable [0x7fb97cfbf000

TM occasionally hang in deploying state in Flink 1.5

2019-04-19 Thread qi luo
Hi all, We use Flink 1.5 batch and start thousands of jobs per day. Occasionally we observed some stuck jobs, due to some TM hang in “DEPLOYING” state. On checking TM log, it shows that it stuck in downloading jars in BlobClient: ... INFO org.apache.flink.runtime.taskexecutor.TaskExecuto