RE: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-21 Thread XIMO GUANTER GONZALBEZ
pushing it forward yourself 😊 Let me know if you need an extra pair of hands! Thanks, Ximo. De: Cheng Su Enviado el: miércoles, 9 de septiembre de 2020 8:57 Para: XIMO GUANTER GONZALBEZ ; Reynold Xin CC: Spark Dev List Asunto: Re: Avoiding unnnecessary sort in FileFormatWriter

RE: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-06 Thread XIMO GUANTER GONZALBEZ
performance in our scenario. Cheers, Ximo. De: Cheng Su Enviado el: viernes, 4 de septiembre de 2020 20:38 Para: Reynold Xin ; XIMO GUANTER GONZALBEZ CC: Spark Dev List Asunto: Re: Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter Hi, Just for context - I created th

Avoiding unnnecessary sort in FileFormatWriter/DynamicPartitionDataWriter

2020-09-04 Thread XIMO GUANTER GONZALBEZ
Hello, I have observed that if a DataFrame is saved with partitioning columns in Parquet, then a sort is performed in FileFormatWriter (see https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L152) because Dynamic