Re: DirectFileOutputCommiter

2016-02-29 Thread Steve Loughran
> On 26 Feb 2016, at 06:24, Takeshi Yamamuro wrote: > > Hi, > > Great work! > What is the concrete performance gain of the committer on s3? > I'd like to know. > > I think there is no direct committer for files because these kinds of > committer has risks > to loss data (See: SPARK-10063). >

Re: DirectFileOutputCommiter

2016-02-29 Thread Takeshi Yamamuro
t;> >>> Wanted to understand if anybody uses DirectFileOutputCommitter or >>>>> alikes >>>>> >>> especially when working with s3? >>>

Re: DirectFileOutputCommiter

2016-02-27 Thread Igor Berman
nted to understand if anybody uses DirectFileOutputCommitter or >>>> alikes >>>> >>> especially when working with s3? >>>> >>> I know that there is one impl in spark distro for parquet format, >>>> but not >>>> >>> for files

Re: DirectFileOutputCommiter

2016-02-26 Thread Alexander Pivovarov
body uses DirectFileOutputCommitter or >>>> alikes >>>> >>> especially when working with s3? >>>> >>> I know that there is one impl in spark distro for parquet format, >>>> but not >>>> >>> for files - why? >>&

Re: DirectFileOutputCommiter

2016-02-26 Thread Reynold Xin
bring huge performance boost. >>> >>> Using default FileOutputCommiter with s3 has big overhead at commit >>> stage >>> >>> when all parts are copied one-by-one to destination dir from >>> _temporary, >>> >>> which is bottleneck

Re: DirectFileOutputCommiter

2016-02-26 Thread Igor Berman
h s3 has big overhead at commit >> stage >> >>> when all parts are copied one-by-one to destination dir from >> _temporary, >> >>> which is bottleneck when number of partitions is high. >> >>> >> >>> Also, wanted

Re: DirectFileOutputCommiter

2016-02-26 Thread Igor Berman
>> for files - why? >>> >>> >>> >>> Imho, it can bring huge performance boost. >>> >>> Using default FileOutputCommiter with s3 has big overhead at commit >>> stage >>> >>> when all parts are copied one-by-one to de

Re: DirectFileOutputCommiter

2016-02-26 Thread Igor Berman
; _temporary, >> >>> which is bottleneck when number of partitions is high. >> >>> >> >>> Also, wanted to know if there are some problems when using >> >>> DirectFileOutputCommitter? >> >>> If writing one partition directly wil

Re: DirectFileOutputCommiter

2016-02-26 Thread Teng Qiu
commit >> stage >> >>> when all parts are copied one-by-one to destination dir from >> _temporary, >> >>> which is bottleneck when number of partitions is high. >> >>> >> >>> Al

Re: DirectFileOutputCommiter

2016-02-26 Thread Alexander Pivovarov
has big overhead at commit >> stage >> >>> when all parts are copied one-by-one to destination dir from >> _temporary, >> >>> which is bottleneck when number of partitions is high. >> >>> >> >>> Also, wanted to k

Re: DirectFileOutputCommiter

2016-02-25 Thread Takeshi Yamamuro
n using > >>> DirectFileOutputCommitter? > >>> If writing one partition directly will fail in the middle is spark will > >>> notice this and will fail job(say after all retries)? > >>> > >>> thanks in advance > >>> > >>> &

Re: DirectFileOutputCommiter

2016-02-25 Thread Teng Qiu
now if there are some problems when using >>> DirectFileOutputCommitter? >>> If writing one partition directly will fail in the middle is spark will >>> notice this and will fail job(say after all retries)? >>> >>> thanks in

Re: DirectFileOutputCommiter

2016-02-25 Thread Yin Yang
OutputCommitter? >> If writing one partition directly will fail in the middle is spark will >> notice this and will fail job(say after all retries)? >> >> thanks in advance >> >> >> >> >> -- >>

Re: DirectFileOutputCommiter

2016-02-25 Thread Teng Qiu
DirectFileOutputCommitter? > If writing one partition directly will fail in the middle is spark will > notice this and will fail job(say after all retries)? > > thanks in advance > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3

DirectFileOutputCommiter

2016-02-22 Thread igor.berman
spark will notice this and will fail job(say after all retries)? thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DirectFileOutputCommiter-tp26296.html Sent from the Apache Spark User List mailing list archive at Nabble.com