Thanks for the info, will give it a try.
BTW - We're using Hadoop 2.7 on AMR EMR 4.4.0.
On Thu, Mar 17, 2016 at 5:55 PM, Ken Krugler
wrote:
> With Hadoop 2.6 or later, you can use the s3a:// protocol (vs. s3n://),
> which should be more reliable (though some bug fixes aren't available until
> 2
The default timeout for opening a split is 5 minutes. You can set a higher
value with "taskmanager.runtime.fs_timeout" (milliseconds), but I believe
that 5 minutes is already way too long.
It would be interesting to find out the root cause of this.
On Thu, Mar 17, 2016 at 11:00 PM, Sourigna Phetsa
With Hadoop 2.6 or later, you can use the s3a:// protocol (vs. s3n://), which
should be more reliable (though some bug fixes aren't available until 2.7, see
https://issues.apache.org/jira/browse/HADOOP-11571)
And you can also then set these properties to control timeouts:
>
> fs.s3a.connecti