Martin,

You can use `.partitionCustom` and provide a partitioner if you want to control 
explicitly how elements are distributed to downstream tasks.

From: Martin Frank Hansen <m...@berlingskemedia.dk>
Reply-To: "m...@berlingskemedia.dk" <m...@berlingskemedia.dk>
Date: Thursday, January 14, 2021 at 1:48 AM
To: user <user@flink.apache.org>
Subject: Deterministic rescale for test

Hi,

I am trying to make a test-suite for our Flink jobs, and are having problems 
making the input-data deterministic.

We are reading a file-input with parallelism 1 and want to rescale to a higher 
parallelism, such that the ordering of the data is the same every time.

I have tried using rebalance, rescale but it seems to randomly distribute data 
between partitions. We don't need something optimized, we just need the same 
distribution for every run.
Is this possible?

Some code:

val env: StreamExecutionEnvironment = 
StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(parallelism)
val rawStream: DataStream[String] = env.readTextFile(s3Path).setParallelism(1)

rawStream.rescale

...
best regards

--

Martin Frank Hansen


Reply via email to