Hi Enrico,
Thanks for replying. I want to partition by a column and then be able to
sort within those partitions based on another column. DataframeWriter has
sortBy and bucketBy but it requires creating a new table (Can only use
`saveAsTable` but not just `save`). I can write another job on top whi
Nikhil,
What are you trying to achieve with this in the first place? What are
your goals? What is the problem with your approach?
Are you concerned about the 1000 files in each written col2-partition?
The write.partitionBy is something different that df.repartition or
df.coalesce.
The df p
Hi folks,
We are trying to do
`
df.coalesce(1000).sortWithinPartitions("col1").write.mode('overwrite').partitionBy("col2").parquet(...)
`
I do see that coalesce 1000 is applied for every sub partition. But I
wanted to know if sortWithinPartitions(col1) works after applying
partitionBy or before?