On 12 Sep 2016, at 19:58, Srikanth
mailto:srikanth...@gmail.com>> wrote:
Thanks Steve!
We are already using HDFS as an intermediate store. This is for the last stage
of processing which has to put data in S3.
The output is partitioned by 3 fields, like
.../field1=111/field2=999/date=-MM-D
Thanks Steve!
We are already using HDFS as an intermediate store. This is for the last
stage of processing which has to put data in S3.
The output is partitioned by 3 fields, like
.../field1=111/field2=999/date=-MM-DD/*
Given that there are 100s for folders and 1000s of subfolder and part
file
> On 9 Sep 2016, at 21:54, Srikanth wrote:
>
> Hello,
>
> I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a
> few configs and none of them seem to work.
> Output always creates _temporary directory. Rename is killing performance.
> I read some notes about DirectOutput