Yes, you can expect each partition file to be sorted by "col1" and "col2".

However, values for "col1" will be "randomly" allocated to partition files, but all rows with the same value for "col1" will reside in the same one partition file.

What kind of unexpected sort order do you observe?

Enrico



Am 16.09.22 um 05:42 schrieb Swetha Baskaran:
Hi!

We expected the order of sorted partitions to be preserved after a dataframe write. We use the following code to write out one file per partition, with the rows sorted by a column.

/df
    .repartition($"col1")
    .sortWithinPartitions("col1", "col2")
    .write
    .partitionBy("col1")
    .csv(path)/

However we observe unexpected sort order in some files. Does spark guarantee sort order within partitions on write?


Thanks,
swebask

Reply via email to