Re: [Spark Internals]: Is sort order preserved after partitioned write?

Enrico Minack Thu, 15 Sep 2022 22:46:03 -0700

Yes, you can expect each partition file to be sorted by "col1" and "col2".

However, values for "col1" will be "randomly" allocated to partitionfiles, but all rows with the same value for "col1" will reside in thesame one partition file.


What kind of unexpected sort order do you observe?

Enrico



Am 16.09.22 um 05:42 schrieb Swetha Baskaran:

Hi!
We expected the order of sorted partitions to be preserved after adataframe write. We use the following code to write out one file perpartition, with the rows sorted by a column.
/df
    .repartition($"col1")
    .sortWithinPartitions("col1", "col2")
    .write
    .partitionBy("col1")
    .csv(path)/
However we observe unexpected sort order in some files. Does sparkguarantee sort order within partitions on write?
Thanks,
swebask

Re: [Spark Internals]: Is sort order preserved after partitioned write?

Reply via email to