Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

Becket Qin Fri, 24 Apr 2020 01:19:40 -0700

Hi Xuannan,

 I am not entirely sure if I understand the cases you mentioned. The users
> can use the cached table object returned by the .cache() method in other
> job and it should read the intermediate result. The intermediate result can
> gone in the following three cases: 1. the user explicitly call the
> invalidateCache() method 2. the TableEnvironment is closed 3. failure
> happens on the TM. When that happens, the intermeidate result will not be
> available unless it is re-generated.



What confused me was that why do we need to have a *cache.retries.max *config?
Shouldn't the missing intermediate result always be automatically
re-generated if it is gone?

Thanks,

Jiangjie (Becket) Qin


On Fri, Apr 24, 2020 at 3:59 PM Xuannan Su <suxuanna...@gmail.com> wrote:

> Hi Becket,
>
> Thanks for the comments.
>
> On Fri, Apr 24, 2020 at 9:12 AM Becket Qin <becket....@gmail.com> wrote:
>
> > Hi Xuannan,
> >
> > Thanks for picking up the FLIP. It looks good to me overall. Some quick
> > comments / questions below:
> >
> > 1. Do we also need changes in the Java API?
> >
>
> Yes, the public interface of Table and TableEnvironment should be made in
> the Java API.
>
>
> > 2. What are the cases that users may want to retry reading the
> intermediate
> > result? It seems that once the intermediate result has gone, it will not
> be
> > available later without being generated again, right?
> >
>
>  I am not entirely sure if I understand the cases you mentioned. The users
> can use the cached table object returned by the .cache() method in other
> job and it should read the intermediate result. The intermediate result can
> gone in the following three cases: 1. the user explicitly call the
> invalidateCache() method 2. the TableEnvironment is closed 3. failure
> happens on the TM. When that happens, the intermeidate result will not be
> available unless it is re-generated.
>
> 3. In the "semantic of cache() method" section, the description "The
> > semantic of the *cache() *method is a little different depending on
> whether
> > auto caching is enabled or not." seems not explained.
> >
>
> This line is actually outdated and should be removed, as we are not adding
> the auto caching functionality in this FLIP. Auto caching will be added in
> the future, and the semantic of cache() when auto caching is enabled will
> be discussed in detail by a new FLIP. I will remove the descriptor to avoid
> further confusion.
>
>
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Wed, Apr 22, 2020 at 4:00 PM Xuannan Su <suxuanna...@gmail.com>
> wrote:
> >
> > > Hi folks,
> > >
> > > I'd like to start the discussion about FLIP-36 Support Interactive
> > > Programming in Flink Table API
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> > >
> > > The FLIP proposes to add support for interactive programming in Flink
> > Table
> > > API. Specifically, it let users cache the intermediate results(tables)
> > and
> > > use them in the later jobs.
> > >
> > > Even though the FLIP has been discussed in the past[1], the FLIP hasn't
> > > formally passed the vote yet. And some of the design and implementation
> > > detail have to change to incorporates the cluster partition proposed in
> > > FLIP-67[2].
> > >
> > > Looking forward to your feedback.
> > >
> > > Thanks,
> > > Xuannan
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-67%3A+Cluster+partitions+lifecycle
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/b372fd7b962b9f37e4dace3bc8828f6e2a2b855e56984e58bc4a413f@%3Cdev.flink.apache.org%3E
> > >
> >
>

Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

Reply via email to