Hi Josh,

Thank you for reporting this, I’m looking into it. There was some major changes 
to the Kinesis connector after mid June, but the changes don’t seem to be 
related to the iterator timeout, so it may be a bug that had always been there.

I’m not sure yet if it may be related, but may I ask how long was your Flink 
job down before restarting it again from the existing state? Was it longer than 
the retention duration of the Kinesis records (default is 24 hours)?

Regards,
Gordon


On August 26, 2016 at 7:20:59 PM, Josh (jof...@gmail.com) wrote:

Hi all,

I guess this is probably a question for Gordon - I've been using the 
Flink-Kinesis connector for a while now and seen this exception a couple of 
times:

com.amazonaws.services.kinesis.model.ExpiredIteratorException: Iterator 
expired. The iterator was created at time Fri Aug 26 10:47:47 UTC 2016 while 
right now it is Fri Aug 26 11:05:40 UTC 2016 which is further in the future 
than the tolerated delay of 300000 milliseconds. (Service: AmazonKinesis; 
Status Code: 400; Error Code: ExpiredIteratorException; Request ID: 
d3db1d90-df97-912b-83e1-3954e766bbe0)

It happens when my Flink job goes down for a couple of hours, then I restart 
from the existing state and it needs to catch up on all the data that has been 
put in Kinesis stream in the hours where the job was down. The job then runs 
for ~15 mins and fails with this exception (and this happens repeatedly - 
meaning I can't restore the job from the existing state).

Any ideas what's causing this? It's possible that it's been fixed in recent 
commits, as the version of the Kinesis connector I'm using is behind master - 
I'm not sure exactly what commit I'm using (doh!) but it was built around mid 
June.

Thanks,
Josh

Reply via email to