Hi folks, Just want to revive this discussion thread. A few of us had some offline discussions around the implementation details of this FLIP.
Here I briefly summarize the offline discussion: -- Some concerns were raised to the default implementation of cache service. 1. The default cache service introduces a separate service in Flink runtime, which seems complicated, especially when things like colocation is needed. 2. Using the Flink job to run default cache service may expose unnecessary implementation details to the users. (e.g. it may take some slot and resource, etc). 3. Sharing of the persistent shuffle in the network stack may need additional work in runtime. In the interest of addressing the above concerns. We would like to make some changes to the current FLIP proposal. In general we agreed that our primary goal is to unify the storage tier of default shuffle service and default intermediate result storage. Stephan gave some valuable suggestions on how to improve the current FLIP design and to align with the efforts of FLIP-31. Some highlights are: 1. Unify the storage tier of default shuffle service and default intermediate result storage to network stack. 2. We need both internal (default) and external services for Shuffle and Intermediate Result. The internal (default) implementation is for out-of-box user experience. The external service is for more sophisticated use cases. 3. Having two interfaces *ShuffleService *and *IntermediateResultStorage (for explicit cache handling). *The internal default network-stack-based solution implement both interfaces. -- As a result of these discussions, we would like to add a few more things to the current FLIP-36. More specifically: 1. A pluggable IntermediateResultStorage interface (for explicit cache handling). 2. A mechanism to enable intermediate results (persisted shuffle and explicit cache) reference across jobs. 3. A stack to manage intermediate result metadata (persisted shuffle and explicit cache) in runtime. The detail design is explained in the following doc. The doc is mostly about the implementation of default intermediate result storage. API wise, it is an addition to the existing Table API change proposed in FLIP. https://docs.google.com/document/d/17twjcQn70rJnVCXcr74AL44HY3jLeT1leC9rAFsluFg/edit# I'll update FLIP-36 wiki to reflect the new proposal. But we can probably use the Google Doc for discussion right now while I am updating the FLIP wiki. Thanks, Jiangjie (Becket) Qin On Thu, Mar 14, 2019 at 9:28 PM Becket Qin <becket....@gmail.com> wrote: > Thanks Piotr, for the +1 and all the patient discussion :) > > On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski <pi...@ververica.com> > wrote: > >> Hi Becket, >> >> Thank you for driving the effort and writing down the detailed proposal. >> To me this FLIP looks good and it has +1 from me. >> >> Piotr Nowojski >> >> > On 12 Mar 2019, at 13:21, Becket Qin <becket....@gmail.com> wrote: >> > >> > Hi folks, >> > >> > We would like to start the discussion thread about FLIP-36 support >> > interactive programming in Flink Table API. >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink >> > >> > There has been an extended discussion[1] in the mailing list. To quick >> > recap, we propose to add capability of caching intermediate results in >> user >> > applications for later usage. >> > >> > Feedback and comments are welcome! >> > >> > Thanks, >> > >> > Jiangjie (Becket) Qin >> > >> > [1] >> > >> http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3ccabtagwernr8otamdt4f-mfzr5s956k530+nxt2s7ieh4i4g...@mail.gmail.com%3E >> >>