Re: Error restoring from checkpoint on Flink 1.8

2019-04-24 Thread Till Rohrmann
For future reference here is a cross link to the referred ML thread discussion [1]. [1] http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E Cheers, Till On Wed, Apr 24, 2019 at 4:00 AM Ning Shi wrote: > Hi Congxian, > > I think I have figured

Re: Error restoring from checkpoint on Flink 1.8

2019-04-23 Thread Ning Shi
Hi Congxian, I think I have figured out the issue. It's related to the checkpoint directory collision issue you responded to in the other thread. We reproduced this bug on 1.6.1 after unchaining the operators. There are two stateful operators in the chain, one is a CoBroadcastWithKeyedOperator, t

Re: Error restoring from checkpoint on Flink 1.8

2019-04-22 Thread Ning Shi
Congxian, Thanks for the reply. I will try to get a minimum reproducer and post it to this thread soon. Ning On Sun, 21 Apr 2019 09:27:12 -0400, Congxian Qiu wrote: > > Hi, > From the given error message, this seems flink can't open RocksDB because > of the number of column family mismatch, do

Re: Error restoring from checkpoint on Flink 1.8

2019-04-21 Thread Congxian Qiu
Hi, >From the given error message, this seems flink can't open RocksDB because of the number of column family mismatch, do you mind sharing a minimum job which can reproduce this problem? Best, Congxian Ning Shi 于2019年4月21日周日 上午10:56写道: > For clarification, one of the operators in the chain me

Re: Error restoring from checkpoint on Flink 1.8

2019-04-20 Thread Ning Shi
For clarification, one of the operators in the chain mentioned in the error message is a KeyedBroadcastProcessFunction, which I believe creates an InternalTimerService implicitly. That might be why "_timer_state" appears in this operator chain. However, it is still a mystery to me why it worked in