Well, the issue I'm trying to solve is slow writing due to S3's
implementation of move as copy/delete. It seems like your S3 committers and
S3Guard both ameliorate that somewhat by parallelizing the copy. I assume
there's no better way to solve this issue without sacrificing safety. Even
if ther
Thanks for the repo, Ryan! I had heard that Netflix had a committer that
used the local filesystem as a temporary store, but I wasn't able to find
that anywhere until now. I implemented something similar that writes to
HDFS and then copies to S3, but it doesn't use the multipart upload API, so
I'
I'm using Spark 1.5.2 and trying to append a data frame to partitioned
Parquet directory in S3. It is known that the default
`ParquetOutputCommitter` performs poorly in S3 because move is implemented
as copy/delete, but the `DirectParquetOutputCommitter` is not safe to use
for append operations in