The reason why this isn't working in Flink are that
* a file can only be written by a single process
* Flink does not support merging of sorted network partitions but reads
round-robin from incoming network channels.
I think if you sort the historic data in parallel (without range
partitioning, i
Thanks.
I have to stream in the historical data and its out-of-boundedness >>
real-time data. I thought there was some elegant way using mapPartition
that I wasn't seeing.
On Fri, Feb 9, 2018 at 5:10 AM, Fabian Hueske wrote:
> You can also partition by range and sort and write each partition. O
You can also partition by range and sort and write each partition. Once all
partitions have been written to files, you can concatenate the files.
As Till said it is not possible to sort in parallel and write in order to a
single file.
Best, Fabian
2018-02-09 10:35 GMT+01:00 Till Rohrmann :
> Hi
Hi David,
Flink only supports sorting within partitions. Thus, if you want to write
out a globally sorted dataset you should set the parallelism to 1 which
effectively results in a single partition. Decreasing the parallelism of an
operator will cause the individual partitions to lose its sort ord