Hi Gordon, My job only went down for around 2-3 hours, and I'm using the default Kinesis retention of 24 hours. When I restored the job, it got this exception after around 15 minutes (and then restarted again, and got the same exception 15 minutes later etc) - but actually I found that after this happened around 5 times the job fully caught up to the head of the stream and started running smoothly again.
Thanks for looking into this! Best, Josh On Fri, Aug 26, 2016 at 1:57 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > Hi Josh, > > Thank you for reporting this, I’m looking into it. There was some major > changes to the Kinesis connector after mid June, but the changes don’t seem > to be related to the iterator timeout, so it may be a bug that had always > been there. > > I’m not sure yet if it may be related, but may I ask how long was your > Flink job down before restarting it again from the existing state? Was it > longer than the retention duration of the Kinesis records (default is 24 > hours)? > > Regards, > Gordon > > > On August 26, 2016 at 7:20:59 PM, Josh (jof...@gmail.com) wrote: > > Hi all, > > I guess this is probably a question for Gordon - I've been using the > Flink-Kinesis connector for a while now and seen this exception a couple of > times: > > com.amazonaws.services.kinesis.model.ExpiredIteratorException: Iterator > expired. The iterator was created at time Fri Aug 26 10:47:47 UTC 2016 while > right now it is Fri Aug 26 11:05:40 UTC 2016 which is further in the future > than the tolerated delay of 300000 milliseconds. (Service: AmazonKinesis; > Status Code: 400; Error Code: ExpiredIteratorException; Request ID: > d3db1d90-df97-912b-83e1-3954e766bbe0) > > > It happens when my Flink job goes down for a couple of hours, then I restart > from the existing state and it needs to catch up on all the data that has > been put in Kinesis stream in the hours where the job was down. The job then > runs for ~15 mins and fails with this exception (and this happens repeatedly > - meaning I can't restore the job from the existing state). > > > Any ideas what's causing this? It's possible that it's been fixed in recent > commits, as the version of the Kinesis connector I'm using is behind master - > I'm not sure exactly what commit I'm using (doh!) but it was built around mid > June. > > > Thanks, > > Josh > >