Hi folks,

Just want to revive this discussion thread. A few of us had some offline
discussions around the implementation details of this FLIP.

Here I briefly summarize the offline discussion:

--
Some concerns were raised to the default implementation of cache service.
1. The default cache service introduces a separate service in Flink
runtime, which seems complicated, especially when things like colocation is
needed.
2. Using the Flink job to run default cache service may expose unnecessary
implementation details to the users. (e.g. it may take some slot and
resource, etc).
3. Sharing of the persistent shuffle in the network stack may need
additional work in runtime.

In the interest of addressing the above concerns. We would like to make
some changes to the current FLIP proposal.

In general we agreed that our primary goal is to unify the storage tier of
default shuffle service and default intermediate result storage.

Stephan gave some valuable suggestions on how to improve the current FLIP
design and to align with the efforts of FLIP-31. Some highlights are:
  1. Unify the storage tier of default shuffle service and default
intermediate result storage to network stack.
  2. We need both internal (default) and external services for Shuffle and
Intermediate Result. The internal (default) implementation is for
out-of-box user experience. The external service is for more sophisticated
use cases.
  3. Having two interfaces *ShuffleService *and *IntermediateResultStorage
(for explicit cache handling). *The internal default network-stack-based
solution implement both interfaces.
--

As a result of these discussions, we would like to add a few more things to
the current FLIP-36. More specifically:
1. A pluggable IntermediateResultStorage interface (for explicit cache
handling).
2. A mechanism to enable intermediate results (persisted shuffle and
explicit cache) reference across jobs.
3. A stack to manage intermediate result metadata (persisted shuffle and
explicit cache) in runtime.

The detail design is explained in the following doc. The doc is mostly
about the implementation of default intermediate result storage. API wise,
it is an addition to the existing Table API change proposed in FLIP.

https://docs.google.com/document/d/17twjcQn70rJnVCXcr74AL44HY3jLeT1leC9rAFsluFg/edit#

I'll update FLIP-36 wiki to reflect the new proposal. But we can probably
use the Google Doc for discussion right now while I am updating the FLIP
wiki.

Thanks,

Jiangjie (Becket) Qin

On Thu, Mar 14, 2019 at 9:28 PM Becket Qin <becket....@gmail.com> wrote:

> Thanks Piotr, for the +1 and all the patient discussion :)
>
> On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski <pi...@ververica.com>
> wrote:
>
>> Hi Becket,
>>
>> Thank you for driving the effort and writing down the detailed proposal.
>> To me this FLIP looks good and it has +1 from me.
>>
>> Piotr Nowojski
>>
>> > On 12 Mar 2019, at 13:21, Becket Qin <becket....@gmail.com> wrote:
>> >
>> > Hi folks,
>> >
>> > We would like to start the discussion thread about FLIP-36 support
>> > interactive programming in Flink Table API.
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>> >
>> > There has been an extended discussion[1] in the mailing list. To quick
>> > recap, we propose to add capability of caching intermediate results in
>> user
>> > applications for later usage.
>> >
>> > Feedback and comments are welcome!
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > [1]
>> >
>> http://mail-archives.apache.org/mod_mbox/flink-dev/201811.mbox/%3ccabtagwernr8otamdt4f-mfzr5s956k530+nxt2s7ieh4i4g...@mail.gmail.com%3E
>>
>>

Reply via email to