Hi all, As described in FLIP-131 [1], we are aiming at deprecating the DataSet API in favour of the DataStream API and the Table API. After this work is done, the user will be able to write a program using the DataStream API and this will execute efficiently on both bounded and unbounded data. But before we reach this point, it is worth discussing and agreeing on the semantics of some operations as we transition from the streaming world to the batch one.
This thread and the associated FLIP [2] aim at discussing these issues as these topics are pretty important to users and can lead to unpleasant surprises if we do not pay attention. Let's have a healthy discussion here and I will be updating the FLIP accordingly. Cheers, Kostas [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158871522