Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2021-04-27 Thread Cheng Su
: XIMO GUANTER GONZALBEZ Cc: Reynold Xin , Spark Dev List Subject: Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter Hi Ximo, sorry for delaying, was busy with other stuff. I will raise a PR in this week, let me ping you for review to leverage your help, thanks. Cheng

Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-21 Thread Cheng Su
hands! Thanks, Ximo. De: Cheng Su Enviado el: miƩrcoles, 9 de septiembre de 2020 8:57 Para: XIMO GUANTER GONZALBEZ ; Reynold Xin CC: Spark Dev List Asunto: Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter Thanks, Ximo. On our side, we do see the similar cases in

RE: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-21 Thread XIMO GUANTER GONZALBEZ
v@spark.apache.org>> Subject: RE: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter > 1.If number of writers exceeds a pre-defined threshold (controlled by > a config), we sort rest of input rows, and fallback to current mode for write. > The config

Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-08 Thread Cheng Su
objection. Thanks, Cheng Su From: XIMO GUANTER GONZALBEZ Date: Sunday, September 6, 2020 at 10:55 PM To: Cheng Su , Reynold Xin Cc: Spark Dev List Subject: RE: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter > 1.If number of writers exceeds a pre-defi

RE: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-06 Thread XIMO GUANTER GONZALBEZ
performance in our scenario. Cheers, Ximo. De: Cheng Su Enviado el: viernes, 4 de septiembre de 2020 20:38 Para: Reynold Xin ; XIMO GUANTER GONZALBEZ CC: Spark Dev List Asunto: Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter Hi, Just for context - I created th

Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-05 Thread Cheng Su
Date: Saturday, September 5, 2020 at 12:54 AM To: Cheng Su Cc: Reynold Xin , XIMO GUANTER GONZALBEZ , Spark Dev List Subject: Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter Hi Cheng, Is there some place where I can get more details on this, or if you could g

Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-05 Thread kalyan
> > > *From: *Reynold Xin > *Date: *Friday, September 4, 2020 at 10:33 AM > *To: *XIMO GUANTER GONZALBEZ > *Cc: *Spark Dev List > *Subject: *Re: Avoiding unnnecessary sort in > FileFormatWriter/DynamicPartitionDataWriter > > > > [image: Image removed by sender

Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-04 Thread Cheng Su
to get more opinion on this. Thanks. Cheng Su From: Reynold Xin Date: Friday, September 4, 2020 at 10:33 AM To: XIMO GUANTER GONZALBEZ Cc: Spark Dev List Subject: Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter The issue is memory overhead. Writing files create

Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-04 Thread Reynold Xin
The issue is memory overhead. Writing files create a lot of buffer (especially in columnar formats like Parquet/ORC). Even a few file handlers and buffers per task can OOM the entire process easily. On Fri, Sep 04, 2020 at 5:51 AM, XIMO GUANTER GONZALBEZ < joaquin.guantergonzal...@telefonica.co