Does S3Guard help with this? I thought it was like S3mper and could
help detect eventual consistency problems, but wouldn't help with the
committer problem.

rb

On Tue, Feb 21, 2017 at 12:39 PM, Matthew Schauer
<matthew.scha...@ibm.com> wrote:
> Thanks for the repo, Ryan!  I had heard that Netflix had a committer that
> used the local filesystem as a temporary store, but I wasn't able to find
> that anywhere until now.  I implemented something similar that writes to
> HDFS and then copies to S3, but it doesn't use the multipart upload API, so
> I'm sure yours will be faster.  I think this is the best thing until S3Guard
> comes out.
>
> As far as my UUID-tracking approach goes, I was under the impression that a
> given task would write the same set of files on each attempt.  Thus, if the
> task fails, either the whole job is aborted and the files are removed, or
> the task is retried and the files are overwritten.  On the other and, I can
> see how having partially-written data visible to readers immediately could
> cause problems, and that is a good reason to avoid my approach.
>
> Steve -- that design document was a very enlightening read.  I will be
> interested in following and possibly contributing to S3Guard in the future.
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p21041.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>



-- 
Ryan Blue
Software Engineer
Netflix

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to