Hi Xuannan, I am not entirely sure if I understand the cases you mentioned. The users > can use the cached table object returned by the .cache() method in other > job and it should read the intermediate result. The intermediate result can > gone in the following three cases: 1. the user explicitly call the > invalidateCache() method 2. the TableEnvironment is closed 3. failure > happens on the TM. When that happens, the intermeidate result will not be > available unless it is re-generated.
What confused me was that why do we need to have a *cache.retries.max *config? Shouldn't the missing intermediate result always be automatically re-generated if it is gone? Thanks, Jiangjie (Becket) Qin On Fri, Apr 24, 2020 at 3:59 PM Xuannan Su <suxuanna...@gmail.com> wrote: > Hi Becket, > > Thanks for the comments. > > On Fri, Apr 24, 2020 at 9:12 AM Becket Qin <becket....@gmail.com> wrote: > > > Hi Xuannan, > > > > Thanks for picking up the FLIP. It looks good to me overall. Some quick > > comments / questions below: > > > > 1. Do we also need changes in the Java API? > > > > Yes, the public interface of Table and TableEnvironment should be made in > the Java API. > > > > 2. What are the cases that users may want to retry reading the > intermediate > > result? It seems that once the intermediate result has gone, it will not > be > > available later without being generated again, right? > > > > I am not entirely sure if I understand the cases you mentioned. The users > can use the cached table object returned by the .cache() method in other > job and it should read the intermediate result. The intermediate result can > gone in the following three cases: 1. the user explicitly call the > invalidateCache() method 2. the TableEnvironment is closed 3. failure > happens on the TM. When that happens, the intermeidate result will not be > available unless it is re-generated. > > 3. In the "semantic of cache() method" section, the description "The > > semantic of the *cache() *method is a little different depending on > whether > > auto caching is enabled or not." seems not explained. > > > > This line is actually outdated and should be removed, as we are not adding > the auto caching functionality in this FLIP. Auto caching will be added in > the future, and the semantic of cache() when auto caching is enabled will > be discussed in detail by a new FLIP. I will remove the descriptor to avoid > further confusion. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > > > > > On Wed, Apr 22, 2020 at 4:00 PM Xuannan Su <suxuanna...@gmail.com> > wrote: > > > > > Hi folks, > > > > > > I'd like to start the discussion about FLIP-36 Support Interactive > > > Programming in Flink Table API > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink > > > > > > The FLIP proposes to add support for interactive programming in Flink > > Table > > > API. Specifically, it let users cache the intermediate results(tables) > > and > > > use them in the later jobs. > > > > > > Even though the FLIP has been discussed in the past[1], the FLIP hasn't > > > formally passed the vote yet. And some of the design and implementation > > > detail have to change to incorporates the cluster partition proposed in > > > FLIP-67[2]. > > > > > > Looking forward to your feedback. > > > > > > Thanks, > > > Xuannan > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-67%3A+Cluster+partitions+lifecycle > > > [2] > > > > > > > > > https://lists.apache.org/thread.html/b372fd7b962b9f37e4dace3bc8828f6e2a2b855e56984e58bc4a413f@%3Cdev.flink.apache.org%3E > > > > > >