Spark uses the Hadoop filesystems.
I assume you are trying to use s3n:// which, under the hood, uses the 3rd
party jets3t library. It is configured through the jets3t.properties file
(google "hadoop s3n jets3t") which you should put on Spark's classpath. The
setting you are looking for is s3servic
object. It has no effect on the file "/dev/output" which is, as far as S3
cares, another object that happens to share part of the objectname with
/dev.
Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833
On Tue, Jan 27, 2015 at 6:33 AM, Chen, Kevin wrote:
> When spark saves rdd
final output by using
a custom OutputCommitter which does not use a temporary location.
Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833
On Wed, Jan 28, 2015 at 3:54 AM, Josh Walton wrote:
> I'm not sure how to confirm how the moving is happening, however, one of
> the job
FYI. We're currently addressing this at the Hadoop level in
https://issues.apache.org/jira/browse/HADOOP-9565
Thomas Demoor
On Mon, Feb 23, 2015 at 10:16 PM, Darin McBeath wrote:
> Just to close the loop in case anyone runs into the same problem I had.
>
> By setting --hadoop-m