Hi Kostas, Thanks for starting this discussion. The first part of this FLIP: "Batch vs Streaming Scheduling" looks reasonable to me. However, there is another dimension I think we should also take into consideration, which is whether checkpointing is enabled.
This option is orthogonal (but not fully) to the boundedness and persistence of the input. For example, consider an arbitrary operator who uses state, we can enable checkpoint to achieve better failure recovery if the input is bounded and pipelined. And if the input is bounded and persistent, we can still use checkpointing, but we might need to checkpoint the offset of the intermediate result set of the operator. This would require much more work and we can defer this to the future. Beyond this dimension, there is another question to be asked. If the topology is mixed with some bounded and unbounded inputs, what would be the behavior? E.g. a join operator with one of its input bounded, and another input unbounded. Can we still use BATCH or STREAMING to define the schedule policy? What kind of failure recovery guarantee Flink can provide to the users. I don't have a clear answer for now, but just want to raise them up to seek some discussion. Best, Kurt On Wed, Aug 12, 2020 at 11:22 PM Kostas Kloudas <kklou...@apache.org> wrote: > Hi all, > > As described in FLIP-131 [1], we are aiming at deprecating the DataSet > API in favour of the DataStream API and the Table API. After this work > is done, the user will be able to write a program using the DataStream > API and this will execute efficiently on both bounded and unbounded > data. But before we reach this point, it is worth discussing and > agreeing on the semantics of some operations as we transition from the > streaming world to the batch one. > > This thread and the associated FLIP [2] aim at discussing these issues > as these topics are pretty important to users and can lead to > unpleasant surprises if we do not pay attention. > > Let's have a healthy discussion here and I will be updating the FLIP > accordingly. > > Cheers, > Kostas > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 > [2] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158871522 >