> On 26 Feb 2016, at 06:24, Takeshi Yamamuro wrote:
>
> Hi,
>
> Great work!
> What is the concrete performance gain of the committer on s3?
> I'd like to know.
>
> I think there is no direct committer for files because these kinds of
> committer has risks
> to loss data (See: SPARK-10063).
>
t;> >>> Wanted to understand if anybody uses DirectFileOutputCommitter or
>>>>> alikes
>>>>> >>> especially when working with s3?
>>>
nted to understand if anybody uses DirectFileOutputCommitter or
>>>> alikes
>>>> >>> especially when working with s3?
>>>> >>> I know that there is one impl in spark distro for parquet format,
>>>> but not
>>>> >>> for files
body uses DirectFileOutputCommitter or
>>>> alikes
>>>> >>> especially when working with s3?
>>>> >>> I know that there is one impl in spark distro for parquet format,
>>>> but not
>>>> >>> for files - why?
>>&
bring huge performance boost.
>>> >>> Using default FileOutputCommiter with s3 has big overhead at commit
>>> stage
>>> >>> when all parts are copied one-by-one to destination dir from
>>> _temporary,
>>> >>> which is bottleneck
h s3 has big overhead at commit
>> stage
>> >>> when all parts are copied one-by-one to destination dir from
>> _temporary,
>> >>> which is bottleneck when number of partitions is high.
>> >>>
>> >>> Also, wanted
>> for files - why?
>>> >>>
>>> >>> Imho, it can bring huge performance boost.
>>> >>> Using default FileOutputCommiter with s3 has big overhead at commit
>>> stage
>>> >>> when all parts are copied one-by-one to de
; _temporary,
>> >>> which is bottleneck when number of partitions is high.
>> >>>
>> >>> Also, wanted to know if there are some problems when using
>> >>> DirectFileOutputCommitter?
>> >>> If writing one partition directly wil
commit
>> stage
>> >>> when all parts are copied one-by-one to destination dir from
>> _temporary,
>> >>> which is bottleneck when number of partitions is high.
>> >>>
>> >>> Al
has big overhead at commit
>> stage
>> >>> when all parts are copied one-by-one to destination dir from
>> _temporary,
>> >>> which is bottleneck when number of partitions is high.
>> >>>
>> >>> Also, wanted to k
n using
> >>> DirectFileOutputCommitter?
> >>> If writing one partition directly will fail in the middle is spark will
> >>> notice this and will fail job(say after all retries)?
> >>>
> >>> thanks in advance
> >>>
> >>>
&
now if there are some problems when using
>>> DirectFileOutputCommitter?
>>> If writing one partition directly will fail in the middle is spark will
>>> notice this and will fail job(say after all retries)?
>>>
>>> thanks in
OutputCommitter?
>> If writing one partition directly will fail in the middle is spark will
>> notice this and will fail job(say after all retries)?
>>
>> thanks in advance
>>
>>
>>
>>
>> --
>>
DirectFileOutputCommitter?
> If writing one partition directly will fail in the middle is spark will
> notice this and will fail job(say after all retries)?
>
> thanks in advance
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3
spark will
notice this and will fail job(say after all retries)?
thanks in advance
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/DirectFileOutputCommiter-tp26296.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
15 matches
Mail list logo