One more question from my side, should we make sure this plays well with the remote shuffle service [1] in case of TM failure?
[1] https://github.com/flink-extended/flink-remote-shuffle D. On Thu, Dec 30, 2021 at 11:59 AM Gen Luo <luogen...@gmail.com> wrote: > Hi Xuannan, > > I found FLIP-188[1] that is aiming to introduce a built-in dynamic table > storage, which provides a unified changelog & table representation. Tables > stored there can be used in further ad-hoc queries. To my understanding, > it's quite like an implementation of caching in Table API, and the ad-hoc > queries are somehow like further steps in an interactive program. > > As you replied, caching at Table/SQL API is the next step, as a part of > interactive programming in Table API, which we all agree is the major > scenario. What do you think about the relation between it and FLIP-188? > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage > > > On Wed, Dec 29, 2021 at 7:53 PM Xuannan Su <suxuanna...@gmail.com> wrote: > > > Hi David, > > > > Thanks for sharing your thoughts. > > > > You are right that most people tend to use high-level API for > > interactive data exploration. Actually, there is > > the FLIP-36 [1] covering the cache API at Table/SQL API. As far as I > > know, it has been accepted but hasn’t been implemented. At the time > > when it is drafted, DataStream did not support Batch mode but Table > > API does. > > > > Now that the DataStream API does support batch processing, I think we > > can focus on supporting cache at DataStream first. It is still > > valuable for DataStream users and most of the work we do in this FLIP > > can be reused. So I want to limit the scope of this FLIP. > > > > After caching is supported at DataStream, we can continue from where > > FLIP-36 left off to support caching at Table/SQL API. We might have to > > re-vote FLIP-36 or draft a new FLIP. What do you think? > > > > Best, > > Xuannan > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink > > > > > > > > On Wed, Dec 29, 2021 at 6:08 PM David Morávek <d...@apache.org> wrote: > > > > > > Hi Xuannan, > > > > > > thanks for drafting this FLIP. > > > > > > One immediate thought, from what I've seen for interactive data > > exploration > > > with Spark, most people tend to use the higher level APIs, that allow > for > > > faster prototyping (Table API in Flink's case). Should the Table API > also > > > be covered by this FLIP? > > > > > > Best, > > > D. > > > > > > On Wed, Dec 29, 2021 at 10:36 AM Xuannan Su <suxuanna...@gmail.com> > > wrote: > > > > > > > Hi devs, > > > > > > > > I’d like to start a discussion about adding support to cache the > > > > intermediate result at DataStream API for batch processing. > > > > > > > > As the DataStream API now supports batch execution mode, we see users > > > > using the DataStream API to run batch jobs. Interactive programming > is > > > > an important use case of Flink batch processing. And the ability to > > > > cache intermediate results of a DataStream is crucial to the > > > > interactive programming experience. > > > > > > > > Therefore, we propose to support caching a DataStream in Batch > > > > execution. We believe that users can benefit a lot from the change > and > > > > encourage them to use DataStream API for their interactive batch > > > > processing work. > > > > > > > > Please check out the FLIP-205 [1] and feel free to reply to this > email > > > > thread. Looking forward to your feedback! > > > > > > > > [1] > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-205%3A+Support+Cache+in+DataStream+for+Batch+Processing > > > > > > > > Best, > > > > Xuannan > > > > > > >