Re: PartitionNotFoundException when restarting from checkpoint

2018-03-15 Thread Stephan Ewen
diamath.com > > > > > > *From: *Seth Wiesman > *Date: *Wednesday, March 14, 2018 at 10:14 AM > *To: *Fabian Hueske , Stefan Richter < > s.rich...@data-artisans.com> > > *Cc: *"user@flink.apache.org" > *Subject: *Re: PartitionNotFoundException when re

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-14 Thread Seth Wiesman
ot; Subject: Re: PartitionNotFoundException when restarting from checkpoint Unfortunately the stack trace was swallowed by the java timer in the LocalInputChannel[1], the real error is forwarded out to the main thread but I couldn’t figure out how to see that in my logs. However, I believe I am c

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-14 Thread Seth Wiesman
8 at 8:02 PM To: Seth Wiesman , Stefan Richter Cc: "user@flink.apache.org" Subject: Re: PartitionNotFoundException when restarting from checkpoint Hi Seth, Thanks for sharing how you resolved the problem! The problem might have been related to Flink's key groups which are used to

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-13 Thread Fabian Hueske
th.com > > > > > > *From: *Seth Wiesman > *Date: *Friday, March 9, 2018 at 11:53 AM > *To: *"user@flink.apache.org" > *Subject: *PartitionNotFoundException when restarting from checkpoint > > > > Hi, > > > > We are running Flink 1.4.0 wi

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-13 Thread Seth Wiesman
g" Subject: PartitionNotFoundException when restarting from checkpoint Hi, We are running Flink 1.4.0 with a yarn deployment on ec2 instances, rocks dB and incremental checkpointing, last night a job failed and became stuck in a restart cycle with a PartitionNotFound. We tried restarting the checkpoint on a fre

PartitionNotFoundException when restarting from checkpoint

2018-03-09 Thread Seth Wiesman
Hi, We are running Flink 1.4.0 with a yarn deployment on ec2 instances, rocks dB and incremental checkpointing, last night a job failed and became stuck in a restart cycle with a PartitionNotFound. We tried restarting the checkpoint on a fresh Flink session with no luck. Looking through the log