Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

David Morávek Mon, 03 Jan 2022 00:17:20 -0800

One more question from my side, should we make sure this plays well with
the remote shuffle service [1] in case of TM failure?


[1] https://github.com/flink-extended/flink-remote-shuffle

D.

On Thu, Dec 30, 2021 at 11:59 AM Gen Luo <luogen...@gmail.com> wrote:

> Hi Xuannan,
>
> I found FLIP-188[1] that is aiming to introduce a built-in dynamic table
> storage, which provides a unified changelog & table representation. Tables
> stored there can be used in further ad-hoc queries. To my understanding,
> it's quite like an implementation of caching in Table API, and the ad-hoc
> queries are somehow like further steps in an interactive program.
>
> As you replied, caching at Table/SQL API is the next step, as a part of
> interactive programming in Table API, which we all agree is the major
> scenario. What do you think about the relation between it and FLIP-188?
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage
>
>
> On Wed, Dec 29, 2021 at 7:53 PM Xuannan Su <suxuanna...@gmail.com> wrote:
>
> > Hi David,
> >
> > Thanks for sharing your thoughts.
> >
> > You are right that most people tend to use high-level API for
> > interactive data exploration. Actually, there is
> > the FLIP-36 [1] covering the cache API at Table/SQL API. As far as I
> > know, it has been accepted but hasn’t been implemented. At the time
> > when it is drafted, DataStream did not support Batch mode but Table
> > API does.
> >
> > Now that the DataStream API does support batch processing, I think we
> > can focus on supporting cache at DataStream first. It is still
> > valuable for DataStream users and most of the work we do in this FLIP
> > can be reused. So I want to limit the scope of this FLIP.
> >
> > After caching is supported at DataStream, we can continue from where
> > FLIP-36 left off to support caching at Table/SQL API. We might have to
> > re-vote FLIP-36 or draft a new FLIP. What do you think?
> >
> > Best,
> > Xuannan
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> >
> >
> >
> > On Wed, Dec 29, 2021 at 6:08 PM David Morávek <d...@apache.org> wrote:
> > >
> > > Hi Xuannan,
> > >
> > > thanks for drafting this FLIP.
> > >
> > > One immediate thought, from what I've seen for interactive data
> > exploration
> > > with Spark, most people tend to use the higher level APIs, that allow
> for
> > > faster prototyping (Table API in Flink's case). Should the Table API
> also
> > > be covered by this FLIP?
> > >
> > > Best,
> > > D.
> > >
> > > On Wed, Dec 29, 2021 at 10:36 AM Xuannan Su <suxuanna...@gmail.com>
> > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I’d like to start a discussion about adding support to cache the
> > > > intermediate result at DataStream API for batch processing.
> > > >
> > > > As the DataStream API now supports batch execution mode, we see users
> > > > using the DataStream API to run batch jobs. Interactive programming
> is
> > > > an important use case of Flink batch processing. And the ability to
> > > > cache intermediate results of a DataStream is crucial to the
> > > > interactive programming experience.
> > > >
> > > > Therefore, we propose to support caching a DataStream in Batch
> > > > execution. We believe that users can benefit a lot from the change
> and
> > > > encourage them to use DataStream API for their interactive batch
> > > > processing work.
> > > >
> > > > Please check out the FLIP-205 [1] and feel free to reply to this
> email
> > > > thread. Looking forward to your feedback!
> > > >
> > > > [1]
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-205%3A+Support+Cache+in+DataStream+for+Batch+Processing
> > > >
> > > > Best,
> > > > Xuannan
> > > >
> >
>

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

Reply via email to