Hi Dawid, I've entered a ticket: https://issues.apache.org/jira/browse/FLINK-21763. Personally, I can keep using the DataSet API for now, but if it will be deprecated at some point, it would be good to migrate rather sooner than later.
Regards, Alexis. ________________________________ From: Dawid Wysakowicz Sent: Friday, March 12, 2021 4:10 PM To: Alexis Sarda-Espinosa; user@flink.apache.org Subject: Re: DataStream in batch mode - handling (un)ordered bounded data Hi Alexis, As of now there is no such feature in the DataStream API. The Batch mode in DataStream API is a new feature and we would be interested to hear about the use cases people want to use it for to identify potential areas to improve. What you are suggesting generally make sense so I think it would be nice if you could create a jira ticket for it. Best, Dawid On 12/03/2021 15:37, Alexis Sarda-Espinosa wrote: Hello, Regarding the new BATCH mode of the data stream API, I see that the documentation states that some operators will process all data for a given key before moving on to the next one. However, I don’t see how Flink is supposed to know whether the input will provide all data for a given key sequentially. In the DataSet API, an (undocumented?) feature is using SplitDataProperties (https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/api/java/io/SplitDataProperties.html) to specify different grouping/partitioning/sorting properties, so if the data is pre-sorted (e.g. when reading from a database), some operations can be optimized. Will the DataStream API get something similar? Regards, Alexis.