I am confused now with the concerns here. This is very much from the user
perspective (which is partially also the developer perspective which is the
sign of an intuitive abstraction).

Of course, there will be docs describing what JMCheckpointStorage and
FsCheckpointStorage are.
And having release notes that describe that RocksDBStateBackend("s3://...")
now corresponds to a combination of "RocksDBBackend" and
"FsCheckpointStorage" is also straightforward.

We said to keep the old RocksDBStateBackend class and let it implement both
interfaces such that the old code still works exactly as before.

What new confusion would be introduced here?
Understanding the difference between JMCheckpointStorage and
FsCheckpointStorage was always necessary when one needed to understand the
difference between MemoryStateBackend and FsStateBackend. It should be
easier to define this after this change, because it is the only thing that
we describe when explaining what checkpoint storage to use (rather than
also having the choice of index structure coupled to that).


On Wed, Sep 23, 2020 at 10:39 AM Aljoscha Krettek <aljos...@apache.org>
wrote:

> On 23.09.20 04:40, Yu Li wrote:
> > To be specific, with the old API users don't need to set checkpoint
> > storage, instead they only need to pass the checkpoint path w/o caring
> > about the storage. The new APIs are forcing users to set the storage so
> > they have to know the difference between different storages. It's not an
> > implementation change, but an API change that users have to understand
> and
> > follow-up.
>
> I think the main point of the FLIP is to make it more obvious to users
> what is happening.
>
> With current Flink, they would do a `setStateBackend(new
> FsStateBackend(<path>))`. What the user is actually "saying" with this
> is: I want to keep state on heap but store checkpoints in DFS. They are
> not actually changing the "State Backend", the thing that keeps state in
> operators, but only where state is checkpointed. The thing that is used
> for local state storage in operators is still the "Heap Backend".
>
> With the proposed FLIP, a user would do a `setCheckpointStorage(new
> FsStorage(<path>))`. Which makes it obvious that they're changing where
> checkpoints are stored but not the actual "State Backend", which is
> still "Heap Backend" (the default).
>
> I do understand Yu's point, though, that this will be confusing for
> current Flink users. They are used to setting a "State Backend" if/when
> they want to change the storage location. To fit the new model they
> would have to change the call from `setStateBackend()` to
> `setCheckpointStorage()`.
>
> I think we need to life with this short-term confusion because in the
> long run the proposed split between checkpoint location and state
> backend makes sense and will be more straightforward for users to
> understand.
>
> Best,
> Aljoscha
>
>

Reply via email to