Hi David, Flink only supports sorting within partitions. Thus, if you want to write out a globally sorted dataset you should set the parallelism to 1 which effectively results in a single partition. Decreasing the parallelism of an operator will cause the individual partitions to lose its sort order because the individual partitions are read in a non deterministic order.
Cheers, Till On Thu, Feb 8, 2018 at 8:07 PM, david westwood <david.d.westw...@gmail.com> wrote: > Hi: > > I would like to sort historical data using the dataset api. > > env.setParallelism(10) > > val dataset = [(Long, String)] .. > .paritionByRange(_._1) > .sortPartition(_._1, Order.ASCEDING) > .writeAsCsv("mydata.csv").setParallelism(1) > > the data is out of order (in local order) > but > .print() > prints the data in to correct order. I have run a small toy sample > multiple times. > > Is there a way to sort the entire dataset with parallelism > 1 and write > it to a single file in ascending order? >