Yes, you can expect each partition file to be sorted by "col1" and "col2".
However, values for "col1" will be "randomly" allocated to partition
files, but all rows with the same value for "col1" will reside in the
same one partition file.
What kind of unexpected sort order do you observe?
Enrico
Am 16.09.22 um 05:42 schrieb Swetha Baskaran:
Hi!
We expected the order of sorted partitions to be preserved after a
dataframe write. We use the following code to write out one file per
partition, with the rows sorted by a column.
/df
.repartition($"col1")
.sortWithinPartitions("col1", "col2")
.write
.partitionBy("col1")
.csv(path)/
However we observe unexpected sort order in some files. Does spark
guarantee sort order within partitions on write?
Thanks,
swebask