subject:"PartitionBy and SortWithinPartitions"

Re: PartitionBy and SortWithinPartitions

2022-06-03 Thread Nikhil Goyal

Hi Enrico, Thanks for replying. I want to partition by a column and then be able to sort within those partitions based on another column. DataframeWriter has sortBy and bucketBy but it requires creating a new table (Can only use `saveAsTable` but not just `save`). I can write another job on top whi

Re: PartitionBy and SortWithinPartitions

2022-06-03 Thread Enrico Minack

Nikhil, What are you trying to achieve with this in the first place? What are your goals? What is the problem with your approach? Are you concerned about the 1000 files in each written col2-partition? The write.partitionBy is something different that df.repartition or df.coalesce. The df p

PartitionBy and SortWithinPartitions

2022-06-03 Thread Nikhil Goyal

Hi folks, We are trying to do ` df.coalesce(1000).sortWithinPartitions("col1").write.mode('overwrite').partitionBy("col2").parquet(...) ` I do see that coalesce 1000 is applied for every sub partition. But I wanted to know if sortWithinPartitions(col1) works after applying partitionBy or before?

Re: PartitionBy and SortWithinPartitions

Re: PartitionBy and SortWithinPartitions

PartitionBy and SortWithinPartitions

3 matches

Site Navigation

Mail list logo

Footer information