Re: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Sourigna Phetsarath
Thanks for the info, will give it a try. BTW - We're using Hadoop 2.7 on AMR EMR 4.4.0. On Thu, Mar 17, 2016 at 5:55 PM, Ken Krugler wrote: > With Hadoop 2.6 or later, you can use the s3a:// protocol (vs. s3n://), > which should be more reliable (though some bug fixes aren't available until > 2

Re: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Robert Metzger
The default timeout for opening a split is 5 minutes. You can set a higher value with "taskmanager.runtime.fs_timeout" (milliseconds), but I believe that 5 minutes is already way too long. It would be interesting to find out the root cause of this. On Thu, Mar 17, 2016 at 11:00 PM, Sourigna Phetsa

RE: S3 Timeouts with lots of Files Using Flink 0.10.2

2016-03-19 Thread Ken Krugler
With Hadoop 2.6 or later, you can use the s3a:// protocol (vs. s3n://), which should be more reliable (though some bug fixes aren't available until 2.7, see https://issues.apache.org/jira/browse/HADOOP-11571) And you can also then set these properties to control timeouts: > > fs.s3a.connecti