Hi:

I would like to sort historical data using the dataset api.

env.setParallelism(10)

val dataset = [(Long, String)] ..
.paritionByRange(_._1)
.sortPartition(_._1, Order.ASCEDING)
.writeAsCsv("mydata.csv").setParallelism(1)

the data is out of order (in local order)
but
.print()
prints the data in to correct order. I have run a small toy sample multiple
times.

Is there a way to sort the entire dataset with parallelism > 1 and write it
to a single file in ascending order?

Reply via email to