Hi Gordon,

My job only went down for around 2-3 hours, and I'm using the default
Kinesis retention of 24 hours. When I restored the job, it got this
exception after around 15 minutes (and then restarted again, and got the
same exception 15 minutes later etc) - but actually I found that after this
happened around 5 times the job fully caught up to the head of the stream
and started running smoothly again.

Thanks for looking into this!

Best,
Josh


On Fri, Aug 26, 2016 at 1:57 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:

> Hi Josh,
>
> Thank you for reporting this, I’m looking into it. There was some major
> changes to the Kinesis connector after mid June, but the changes don’t seem
> to be related to the iterator timeout, so it may be a bug that had always
> been there.
>
> I’m not sure yet if it may be related, but may I ask how long was your
> Flink job down before restarting it again from the existing state? Was it
> longer than the retention duration of the Kinesis records (default is 24
> hours)?
>
> Regards,
> Gordon
>
>
> On August 26, 2016 at 7:20:59 PM, Josh (jof...@gmail.com) wrote:
>
> Hi all,
>
> I guess this is probably a question for Gordon - I've been using the
> Flink-Kinesis connector for a while now and seen this exception a couple of
> times:
>
> com.amazonaws.services.kinesis.model.ExpiredIteratorException: Iterator 
> expired. The iterator was created at time Fri Aug 26 10:47:47 UTC 2016 while 
> right now it is Fri Aug 26 11:05:40 UTC 2016 which is further in the future 
> than the tolerated delay of 300000 milliseconds. (Service: AmazonKinesis; 
> Status Code: 400; Error Code: ExpiredIteratorException; Request ID: 
> d3db1d90-df97-912b-83e1-3954e766bbe0)
>
>
> It happens when my Flink job goes down for a couple of hours, then I restart 
> from the existing state and it needs to catch up on all the data that has 
> been put in Kinesis stream in the hours where the job was down. The job then 
> runs for ~15 mins and fails with this exception (and this happens repeatedly 
> - meaning I can't restore the job from the existing state).
>
>
> Any ideas what's causing this? It's possible that it's been fixed in recent 
> commits, as the version of the Kinesis connector I'm using is behind master - 
> I'm not sure exactly what commit I'm using (doh!) but it was built around mid 
> June.
>
>
> Thanks,
>
> Josh
>
>

Reply via email to