Hi Martin, Rebalance and Rescale use round robin and are deterministic in your case (assuming same task manager / slots). You just need to stay clear of ShufflePartitioner.
If you are seeing something non-deterministic, could you please share an example? On Fri, Jan 15, 2021 at 7:19 AM Martin Frank Hansen <m...@berlingskemedia.dk> wrote: > Hi Jaffe, > > Thanks for your reply, I will try to use a Custom Partioner. > > Den tor. 14. jan. 2021 kl. 19.39 skrev Jaffe, Julian < > julianja...@activision.com>: > >> Martin, >> >> >> >> You can use `.partitionCustom` and provide a partitioner if you want to >> control explicitly how elements are distributed to downstream tasks. >> >> >> >> *From: *Martin Frank Hansen <m...@berlingskemedia.dk> >> *Reply-To: *"m...@berlingskemedia.dk" <m...@berlingskemedia.dk> >> *Date: *Thursday, January 14, 2021 at 1:48 AM >> *To: *user <user@flink.apache.org> >> *Subject: *Deterministic rescale for test >> >> >> >> Hi, >> >> I am trying to make a test-suite for our Flink jobs, and are having >> problems making the input-data deterministic. >> >> We are reading a file-input with parallelism 1 and want to rescale to a >> higher parallelism, such that the ordering of the data is the same every >> time. >> >> I have tried using rebalance, rescale but it seems to randomly distribute >> data between partitions. We don't need something optimized, we just need >> the same distribution for every run. >> >> Is this possible? >> >> >> Some code: >> >> val env: StreamExecutionEnvironment = StreamExecutionEnvironment. >> *getExecutionEnvironment*env.setStreamTimeCharacteristic(TimeCharacteristic.*EventTime*) >> env.setParallelism(parallelism) >> val rawStream: DataStream[String] = >> env.readTextFile(s3Path).setParallelism(1) >> >> rawStream.rescale >> >> ... >> >> best regards >> >> >> >> -- >> >> *Martin Frank Hansen* >> >> >> > > > -- > > Martin Frank Hansen > > > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng