Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

Xuannan Su Wed, 29 Apr 2020 00:54:23 -0700

Hi folks,

The FLIP-36 is updated according to the discussion with Becket. In the
meantime, any comments are very welcome.


If there are no further comments, I would like to start the voting
thread by tomorrow.

Thanks,
Xuannan


On Sun, Apr 26, 2020 at 9:34 AM Xuannan Su <[email protected]> wrote:

> Hi Becket,
>
> You are right. It makes sense to treat retry of job 2 as an ordinary job.
> And the config does introduce some unnecessary confusion. Thank you for you
> comment. I will update the FLIP.
>
> Best,
> Xuannan
>
> On Sat, Apr 25, 2020 at 7:44 AM Becket Qin <[email protected]> wrote:
>
>> Hi Xuannan,
>>
>> If user submits Job 1 and generated a cached intermediate result. And
>> later
>> on, user submitted job 2 which should ideally use the intermediate result.
>> In that case, if job 2 failed due to missing the intermediate result, Job
>> 2
>> should be retried with its full DAG. After that when Job 2 runs, it will
>> also re-generate the cache. However, once job 2 has fell back to the
>> original DAG, should it just be treated as an ordinary job that follow the
>> recovery strategy? Having a separate configuration seems a little
>> confusing. In another word, re-generating the cache is just a byproduct of
>> running the full DAG of job 2, but is not the main purpose. It is just
>> like
>> when job 1 runs to generate cache, it does not have a separate config of
>> retry to make sure the cache is generated. If it fails, it just fail like
>> an ordinary job.
>>
>> What do you think?
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Fri, Apr 24, 2020 at 5:00 PM Xuannan Su <[email protected]> wrote:
>>
>> > Hi Becket,
>> >
>> > The intermediate result will indeed be automatically re-generated by
>> > resubmitting the original DAG. And that job could fail as well. In that
>> > case, we need to decide if we should resubmit the original DAG to
>> > re-generate the intermediate result or give up and throw an exception to
>> > the user. And the config is to indicate how many resubmit should happen
>> > before giving up.
>> >
>> > Thanks,
>> > Xuannan
>> >
>> > On Fri, Apr 24, 2020 at 4:19 PM Becket Qin <[email protected]>
>> wrote:
>> >
>> > > Hi Xuannan,
>> > >
>> > >  I am not entirely sure if I understand the cases you mentioned. The
>> > users
>> > > > can use the cached table object returned by the .cache() method in
>> > other
>> > > > job and it should read the intermediate result. The intermediate
>> result
>> > > can
>> > > > gone in the following three cases: 1. the user explicitly call the
>> > > > invalidateCache() method 2. the TableEnvironment is closed 3.
>> failure
>> > > > happens on the TM. When that happens, the intermeidate result will
>> not
>> > be
>> > > > available unless it is re-generated.
>> > >
>> > >
>> > > What confused me was that why do we need to have a *cache.retries.max
>> > > *config?
>> > > Shouldn't the missing intermediate result always be automatically
>> > > re-generated if it is gone?
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On Fri, Apr 24, 2020 at 3:59 PM Xuannan Su <[email protected]>
>> > wrote:
>> > >
>> > > > Hi Becket,
>> > > >
>> > > > Thanks for the comments.
>> > > >
>> > > > On Fri, Apr 24, 2020 at 9:12 AM Becket Qin <[email protected]>
>> > wrote:
>> > > >
>> > > > > Hi Xuannan,
>> > > > >
>> > > > > Thanks for picking up the FLIP. It looks good to me overall. Some
>> > quick
>> > > > > comments / questions below:
>> > > > >
>> > > > > 1. Do we also need changes in the Java API?
>> > > > >
>> > > >
>> > > > Yes, the public interface of Table and TableEnvironment should be
>> made
>> > in
>> > > > the Java API.
>> > > >
>> > > >
>> > > > > 2. What are the cases that users may want to retry reading the
>> > > > intermediate
>> > > > > result? It seems that once the intermediate result has gone, it
>> will
>> > > not
>> > > > be
>> > > > > available later without being generated again, right?
>> > > > >
>> > > >
>> > > >  I am not entirely sure if I understand the cases you mentioned. The
>> > > users
>> > > > can use the cached table object returned by the .cache() method in
>> > other
>> > > > job and it should read the intermediate result. The intermediate
>> result
>> > > can
>> > > > gone in the following three cases: 1. the user explicitly call the
>> > > > invalidateCache() method 2. the TableEnvironment is closed 3.
>> failure
>> > > > happens on the TM. When that happens, the intermeidate result will
>> not
>> > be
>> > > > available unless it is re-generated.
>> > > >
>> > > > 3. In the "semantic of cache() method" section, the description "The
>> > > > > semantic of the *cache() *method is a little different depending
>> on
>> > > > whether
>> > > > > auto caching is enabled or not." seems not explained.
>> > > > >
>> > > >
>> > > > This line is actually outdated and should be removed, as we are not
>> > > adding
>> > > > the auto caching functionality in this FLIP. Auto caching will be
>> added
>> > > in
>> > > > the future, and the semantic of cache() when auto caching is enabled
>> > will
>> > > > be discussed in detail by a new FLIP. I will remove the descriptor
>> to
>> > > avoid
>> > > > further confusion.
>> > > >
>> > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Apr 22, 2020 at 4:00 PM Xuannan Su <[email protected]
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi folks,
>> > > > > >
>> > > > > > I'd like to start the discussion about FLIP-36 Support
>> Interactive
>> > > > > > Programming in Flink Table API
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>> > > > > >
>> > > > > > The FLIP proposes to add support for interactive programming in
>> > Flink
>> > > > > Table
>> > > > > > API. Specifically, it let users cache the intermediate
>> > > results(tables)
>> > > > > and
>> > > > > > use them in the later jobs.
>> > > > > >
>> > > > > > Even though the FLIP has been discussed in the past[1], the FLIP
>> > > hasn't
>> > > > > > formally passed the vote yet. And some of the design and
>> > > implementation
>> > > > > > detail have to change to incorporates the cluster partition
>> > proposed
>> > > in
>> > > > > > FLIP-67[2].
>> > > > > >
>> > > > > > Looking forward to your feedback.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Xuannan
>> > > > > >
>> > > > > > [1]
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-67%3A+Cluster+partitions+lifecycle
>> > > > > > [2]
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/b372fd7b962b9f37e4dace3bc8828f6e2a2b855e56984e58bc4a413f@%3Cdev.flink.apache.org%3E
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

Reply via email to