Re: Spark with S3 DirectOutputCommitter

2016-09-13 Thread Steve Loughran
On 12 Sep 2016, at 19:58, Srikanth mailto:srikanth...@gmail.com>> wrote: Thanks Steve! We are already using HDFS as an intermediate store. This is for the last stage of processing which has to put data in S3. The output is partitioned by 3 fields, like .../field1=111/field2=999/date=-MM-D

Re: Spark with S3 DirectOutputCommitter

2016-09-12 Thread Srikanth
Thanks Steve! We are already using HDFS as an intermediate store. This is for the last stage of processing which has to put data in S3. The output is partitioned by 3 fields, like .../field1=111/field2=999/date=-MM-DD/* Given that there are 100s for folders and 1000s of subfolder and part file

Re: Spark with S3 DirectOutputCommitter

2016-09-11 Thread Steve Loughran
> On 9 Sep 2016, at 21:54, Srikanth wrote: > > Hello, > > I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a > few configs and none of them seem to work. > Output always creates _temporary directory. Rename is killing performance. > I read some notes about DirectOutput