Sorry for joining so late. Generally speaking I like this idea very much!
I have one idea about the StateBackend interface. Could we instead of adding a flag method boolean isLegacyStateBackend remove the checkpointstorage related methods from StateBackend right away? The old/legacy implementations could then implement both StateBackend and SnapshotStorage. In turn in the method env.setStateBackend we could do: setStateBackend(StateBackend backend) { this.stateBackend = backend; if (backend instanceof SnapshotStorage) { this.setSnapshotStorage(backend); } } This has the benefit that we could already get rid off the methods from StateBackend which would be problematic in the new implementations (such as e.g. HashMapStateBackend - what would you return there? null?). I know this would break the interface, but StateBackend is actually quite internal, we did it quite freely in the past, and I don't think there are many custom state implementation in the wild. And even if there are some the workaround is as easy as simply adding implements SnapshotStorage. Best, Dawid On 11/09/2020 16:48, Aljoscha Krettek wrote: > I could try and come up with a longer name if you need it ... ;-) > > Aljoscha > > On 11.09.20 16:25, Seth Wiesman wrote: >> Having thought about it more, HashMapStateBackend has won me over. I'll >> update the FLIP. If there aren't any more comments I'll open it up for >> voting on monday. >> >> Seth >> >> On Wed, Sep 9, 2020 at 9:09 AM Seth Wiesman <sjwies...@gmail.com> wrote: >> >>> @Yun yes, this is really about making CheckpointStorage an orthogonal >>> concept. I think we can remain pragmatic and keep state-backend >>> specific >>> configurations (async, incremental, etc) in the state backend >>> themselves. I >>> view these as more advanced configurations and by the time someone is >>> changing the defaults they likely understand what is going on. My >>> goal is >>> to help on-board users and so long as each state backend has a no-arg >>> default constructor that works for many users I think we've achieved >>> that >>> goal. >>> >>> Regarding the checkpoint coordinator, that makes sense but I will >>> consider >>> out of the scope of this FLIP. I want to focus on simplifying APIs. >>> >>> @Aljoscha Krettek <aljos...@apache.org> >>> >>> My feeling is that state backends and checkpointing are going to be >>> integral to Flink for many years, regardless or other enhancements >>> so this >>> change is still valuable. >>> >>> Since this is a FLIP about improving the user api I'm happy to bikeshed >>> the names a little more than normal. HashMap makes sense, my other >>> thought >>> was InMemory. >>> >>> Seth >>> >>> >>> >>> On Wed, Sep 9, 2020 at 8:04 AM Aljoscha Krettek <aljos...@apache.org> >>> wrote: >>> >>>> I like it a lot! >>>> >>>> I think it makes sense to clean this up despite the planned new >>>> fault-tolerance mechanisms. In the future, users will decide which >>>> mechanism to use and I can imagine that a lot of them will keep using >>>> the current mechanism for quite a while to come. But I'm happy to >>>> yield >>>> to Stephan's opinion here, he knows more about the progress of that >>>> work. >>>> >>>> The one nitpick I have is about naming: will users understand >>>> OnHeapStateBackend? I mean, do they know what on-heap/off-heap >>>> memory is >>>> and the tradeoffs? An alternative could be HashMapStateBackend, >>>> because >>>> that's essentially what it is. I wouldn't block anything on this, >>>> though. >>>> >>>> Aljoscha >>>> >>>> On 09.09.20 10:05, Konstantin Knauf wrote: >>>>> Thanks for the initiative. Big +1. Would be interested to hear if the >>>>> proposed interfaces still make sense in the face of the new >>>> fault-tolerance >>>>> work that is planned. Stephan/Piotr will know. >>>>> >>>>> On Tue, Sep 8, 2020 at 7:05 PM Seth Wiesman <sjwies...@gmail.com> >>>> wrote: >>>>> >>>>>> Hi Devs, >>>>>> >>>>>> I'd like to propose an update to how state backends and checkpoint >>>> storage >>>>>> are configured to help users better understand Flink. >>>>>> >>>>>> Apache Flink's durability story is a mystery to many users. One >>>>>> of the >>>> most >>>>>> common recurring questions from users comes from not >>>>>> understanding the >>>>>> relationship between state, state backends, and snapshots. Some >>>>>> of this >>>>>> confusion can be abated with learning material but the question >>>>>> is so >>>>>> pervasive that we believe Flink’s user APIs should be better >>>> communicate >>>>>> what different components are responsible for. >>>>>> >>>>>> >>>>>> >>>>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-142%3A+Disentangle+StateBackends+from+Checkpointing >>>> >>>>>> >>>>>> >>>>>> I look forward to a healthy discussion. >>>>>> >>>>>> >>>>>> Seth >>>>>> >>>>> >>>>> >>>> >>>> >> >
signature.asc
Description: OpenPGP digital signature