: XIMO GUANTER GONZALBEZ
Cc: Reynold Xin , Spark Dev List
Subject: Re: Avoiding unnnecessary sort in
FileFormatWriter/DynamicPartitionDataWriter
Hi Ximo, sorry for delaying, was busy with other stuff. I will raise a PR in
this week, let me ping you for review to leverage your help, thanks.
Cheng
hands!
Thanks,
Ximo.
De: Cheng Su
Enviado el: miƩrcoles, 9 de septiembre de 2020 8:57
Para: XIMO GUANTER GONZALBEZ ; Reynold
Xin
CC: Spark Dev List
Asunto: Re: Avoiding unnnecessary sort in
FileFormatWriter/DynamicPartitionDataWriter
Thanks, Ximo. On our side, we do see the similar cases in
v@spark.apache.org>>
Subject: RE: Avoiding unnnecessary sort in
FileFormatWriter/DynamicPartitionDataWriter
> 1.If number of writers exceeds a pre-defined threshold (controlled by
> a config), we sort rest of input rows, and fallback to current mode for write.
> The config
objection.
Thanks,
Cheng Su
From: XIMO GUANTER GONZALBEZ
Date: Sunday, September 6, 2020 at 10:55 PM
To: Cheng Su , Reynold Xin
Cc: Spark Dev List
Subject: RE: Avoiding unnnecessary sort in
FileFormatWriter/DynamicPartitionDataWriter
> 1.If number of writers exceeds a pre-defi
performance in our
scenario.
Cheers,
Ximo.
De: Cheng Su
Enviado el: viernes, 4 de septiembre de 2020 20:38
Para: Reynold Xin ; XIMO GUANTER GONZALBEZ
CC: Spark Dev List
Asunto: Re: Avoiding unnnecessary sort in
FileFormatWriter/DynamicPartitionDataWriter
Hi,
Just for context - I created th
Date: Saturday, September 5, 2020 at 12:54 AM
To: Cheng Su
Cc: Reynold Xin , XIMO GUANTER GONZALBEZ
, Spark Dev List
Subject: Re: Avoiding unnnecessary sort in
FileFormatWriter/DynamicPartitionDataWriter
Hi Cheng,
Is there some place where I can get more details on this, or if you could g
>
>
> *From: *Reynold Xin
> *Date: *Friday, September 4, 2020 at 10:33 AM
> *To: *XIMO GUANTER GONZALBEZ
> *Cc: *Spark Dev List
> *Subject: *Re: Avoiding unnnecessary sort in
> FileFormatWriter/DynamicPartitionDataWriter
>
>
>
> [image: Image removed by sender
to get more opinion on this. Thanks.
Cheng Su
From: Reynold Xin
Date: Friday, September 4, 2020 at 10:33 AM
To: XIMO GUANTER GONZALBEZ
Cc: Spark Dev List
Subject: Re: Avoiding unnnecessary sort in
FileFormatWriter/DynamicPartitionDataWriter
The issue is memory overhead. Writing files create
The issue is memory overhead. Writing files create a lot of buffer (especially
in columnar formats like Parquet/ORC). Even a few file handlers and buffers per
task can OOM the entire process easily.
On Fri, Sep 04, 2020 at 5:51 AM, XIMO GUANTER GONZALBEZ <
joaquin.guantergonzal...@telefonica.co
Hello,
I have observed that if a DataFrame is saved with partitioning columns in
Parquet, then a sort is performed in FileFormatWriter (see
https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L152)
because Dynamic
10 matches
Mail list logo