Re: Question about the behavior of TM when it lost the zookeeper client session in HA mode

2018-07-18 Thread Ron Crocker
I just stumbled on this same problem without any associated ZK issues. We had a Kafka broker fail that caused this issue: 2018-07-18 02:48:13,497 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Produce: (2/4) (7e7d61b286d90c51bbd20a15796633f2) switched from RUNNING

Re: Question about the behavior of TM when it lost the zookeeper client session in HA mode

2018-05-16 Thread Tony Wei
Hi Ufuk, Piotr Thanks for all of your replies. I knew that jobs are cancelled if the JM looses the connection to ZK, but JM didn't loose connection in my case. My job failed because of the exception from KafkaProducer. However, it happened before and after that exception that TM lost ZK connection

Re: Question about the behavior of TM when it lost the zookeeper client session in HA mode

2018-05-15 Thread Piotr Nowojski
Hi, It looks like there was an error in asynchronous job of sending the records to Kafka. Probably this is a collateral damage of loosing connection to zookeeper. Piotrek > On 15 May 2018, at 13:33, Ufuk Celebi wrote: > > Hey Tony, > > thanks for the detailed report. > > - In Flink 1.4, jo

Question about the behavior of TM when it lost the zookeeper client session in HA mode

2018-05-13 Thread Tony Wei
Hi all, Recently, my flink job met a problem that caused the job failed and restarted. The log is list this screen snapshot or this ``` 2018-05-11 13:21:04,582 WARN org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 610