Re: DataStream in batch mode - handling (un)ordered bounded data

Alexis Sarda-Espinosa Sat, 13 Mar 2021 03:57:33 -0800

Hi Dawid,

I've entered a ticket: https://issues.apache.org/jira/browse/FLINK-21763. 
Personally, I can keep using the DataSet API for now, but if it will be 
deprecated at some point, it would be good to migrate rather sooner than later.

Regards,
Alexis.

________________________________
From: Dawid Wysakowicz
Sent: Friday, March 12, 2021 4:10 PM
To: Alexis Sarda-Espinosa; user@flink.apache.org
Subject: Re: DataStream in batch mode - handling (un)ordered bounded data

Hi Alexis,

As of now there is no such feature in the DataStream API. The Batch mode in 
DataStream API is a new feature and we would be interested to hear about the 
use cases people want to use it for to identify potential areas to improve. 
What you are suggesting generally make sense so I think it would be nice if you 
could create a jira ticket for it.

Best,

Dawid

On 12/03/2021 15:37, Alexis Sarda-Espinosa wrote:

Hello,

Regarding the new BATCH mode of the data stream API, I see that the 
documentation states that some operators will process all data for a given key 
before moving on to the next one. However, I don’t see how Flink is supposed to 
know whether the input will provide all data for a given key sequentially. In 
the DataSet API, an (undocumented?) feature is using SplitDataProperties 
(https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/api/java/io/SplitDataProperties.html)
 to specify different grouping/partitioning/sorting properties, so if the data 
is pre-sorted (e.g. when reading from a database), some operations can be 
optimized. Will the DataStream API get something similar?

Regards,

Alexis.

Re: DataStream in batch mode - handling (un)ordered bounded data

Reply via email to