Thank you Fabian. I tried to implement a quick test basing on what you suggested: having an offset from system time, and I did get improvement: with offset = 500ms - the problem has completely gone. With offset = 50ms, I still got around 3-5 files missed out of 10,000. This number might come from the difference between clocks of the EC2 instance and S3.
I Will now try to implement exactly what you suggested, and open a Jira issue as well. Thanks for your help. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/