Hi folks, The FLIP-36 is updated according to the discussion with Becket. In the meantime, any comments are very welcome.
If there are no further comments, I would like to start the voting thread by tomorrow. Thanks, Xuannan On Sun, Apr 26, 2020 at 9:34 AM Xuannan Su <suxuanna...@gmail.com> wrote: > Hi Becket, > > You are right. It makes sense to treat retry of job 2 as an ordinary job. > And the config does introduce some unnecessary confusion. Thank you for you > comment. I will update the FLIP. > > Best, > Xuannan > > On Sat, Apr 25, 2020 at 7:44 AM Becket Qin <becket....@gmail.com> wrote: > >> Hi Xuannan, >> >> If user submits Job 1 and generated a cached intermediate result. And >> later >> on, user submitted job 2 which should ideally use the intermediate result. >> In that case, if job 2 failed due to missing the intermediate result, Job >> 2 >> should be retried with its full DAG. After that when Job 2 runs, it will >> also re-generate the cache. However, once job 2 has fell back to the >> original DAG, should it just be treated as an ordinary job that follow the >> recovery strategy? Having a separate configuration seems a little >> confusing. In another word, re-generating the cache is just a byproduct of >> running the full DAG of job 2, but is not the main purpose. It is just >> like >> when job 1 runs to generate cache, it does not have a separate config of >> retry to make sure the cache is generated. If it fails, it just fail like >> an ordinary job. >> >> What do you think? >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Fri, Apr 24, 2020 at 5:00 PM Xuannan Su <suxuanna...@gmail.com> wrote: >> >> > Hi Becket, >> > >> > The intermediate result will indeed be automatically re-generated by >> > resubmitting the original DAG. And that job could fail as well. In that >> > case, we need to decide if we should resubmit the original DAG to >> > re-generate the intermediate result or give up and throw an exception to >> > the user. And the config is to indicate how many resubmit should happen >> > before giving up. >> > >> > Thanks, >> > Xuannan >> > >> > On Fri, Apr 24, 2020 at 4:19 PM Becket Qin <becket....@gmail.com> >> wrote: >> > >> > > Hi Xuannan, >> > > >> > > I am not entirely sure if I understand the cases you mentioned. The >> > users >> > > > can use the cached table object returned by the .cache() method in >> > other >> > > > job and it should read the intermediate result. The intermediate >> result >> > > can >> > > > gone in the following three cases: 1. the user explicitly call the >> > > > invalidateCache() method 2. the TableEnvironment is closed 3. >> failure >> > > > happens on the TM. When that happens, the intermeidate result will >> not >> > be >> > > > available unless it is re-generated. >> > > >> > > >> > > What confused me was that why do we need to have a *cache.retries.max >> > > *config? >> > > Shouldn't the missing intermediate result always be automatically >> > > re-generated if it is gone? >> > > >> > > Thanks, >> > > >> > > Jiangjie (Becket) Qin >> > > >> > > >> > > On Fri, Apr 24, 2020 at 3:59 PM Xuannan Su <suxuanna...@gmail.com> >> > wrote: >> > > >> > > > Hi Becket, >> > > > >> > > > Thanks for the comments. >> > > > >> > > > On Fri, Apr 24, 2020 at 9:12 AM Becket Qin <becket....@gmail.com> >> > wrote: >> > > > >> > > > > Hi Xuannan, >> > > > > >> > > > > Thanks for picking up the FLIP. It looks good to me overall. Some >> > quick >> > > > > comments / questions below: >> > > > > >> > > > > 1. Do we also need changes in the Java API? >> > > > > >> > > > >> > > > Yes, the public interface of Table and TableEnvironment should be >> made >> > in >> > > > the Java API. >> > > > >> > > > >> > > > > 2. What are the cases that users may want to retry reading the >> > > > intermediate >> > > > > result? It seems that once the intermediate result has gone, it >> will >> > > not >> > > > be >> > > > > available later without being generated again, right? >> > > > > >> > > > >> > > > I am not entirely sure if I understand the cases you mentioned. The >> > > users >> > > > can use the cached table object returned by the .cache() method in >> > other >> > > > job and it should read the intermediate result. The intermediate >> result >> > > can >> > > > gone in the following three cases: 1. the user explicitly call the >> > > > invalidateCache() method 2. the TableEnvironment is closed 3. >> failure >> > > > happens on the TM. When that happens, the intermeidate result will >> not >> > be >> > > > available unless it is re-generated. >> > > > >> > > > 3. In the "semantic of cache() method" section, the description "The >> > > > > semantic of the *cache() *method is a little different depending >> on >> > > > whether >> > > > > auto caching is enabled or not." seems not explained. >> > > > > >> > > > >> > > > This line is actually outdated and should be removed, as we are not >> > > adding >> > > > the auto caching functionality in this FLIP. Auto caching will be >> added >> > > in >> > > > the future, and the semantic of cache() when auto caching is enabled >> > will >> > > > be discussed in detail by a new FLIP. I will remove the descriptor >> to >> > > avoid >> > > > further confusion. >> > > > >> > > > >> > > > > Thanks, >> > > > > >> > > > > Jiangjie (Becket) Qin >> > > > > >> > > > > >> > > > > >> > > > > On Wed, Apr 22, 2020 at 4:00 PM Xuannan Su <suxuanna...@gmail.com >> > >> > > > wrote: >> > > > > >> > > > > > Hi folks, >> > > > > > >> > > > > > I'd like to start the discussion about FLIP-36 Support >> Interactive >> > > > > > Programming in Flink Table API >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink >> > > > > > >> > > > > > The FLIP proposes to add support for interactive programming in >> > Flink >> > > > > Table >> > > > > > API. Specifically, it let users cache the intermediate >> > > results(tables) >> > > > > and >> > > > > > use them in the later jobs. >> > > > > > >> > > > > > Even though the FLIP has been discussed in the past[1], the FLIP >> > > hasn't >> > > > > > formally passed the vote yet. And some of the design and >> > > implementation >> > > > > > detail have to change to incorporates the cluster partition >> > proposed >> > > in >> > > > > > FLIP-67[2]. >> > > > > > >> > > > > > Looking forward to your feedback. >> > > > > > >> > > > > > Thanks, >> > > > > > Xuannan >> > > > > > >> > > > > > [1] >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-67%3A+Cluster+partitions+lifecycle >> > > > > > [2] >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://lists.apache.org/thread.html/b372fd7b962b9f37e4dace3bc8828f6e2a2b855e56984e58bc4a413f@%3Cdev.flink.apache.org%3E >> > > > > > >> > > > > >> > > > >> > > >> > >> >