Re: StateBackend

Matthias J. Sax Tue, 16 Feb 2016 02:54:01 -0800

Thanks for the input.

Just to clearly my understanding: by "Flink-embedded [...] scale out as
the Flink job scales out", you mean that each TM hosts an embedded state
backend service, ie, all those instances form a logically single but
distributed backend service? How is is ensure, that the state is store
reliable (ie, not on the same machine the state belongs to)? Is this
handled by the service automatically, or is it Flink's responsibility?


What do you mean by "They work nicely with savepoints, because every
Flink job has a copy of the state"?

The classification itself makes sense. I guess, we should reflect this
in the documentation. Not sure, if the code can/should reflect this -- I
doubt it.

-Matthias


On 02/16/2016 10:32 AM, Stephan Ewen wrote:
> I think this is actually a pretty good question. Right now, there are two
> different types of state backends:
> 
>   (1) Flink-embedded. They are independent of external services, scale out
> as the Flink job scales out, and are really mainly a way of storing and
> backuping key/value state.
>         For example: MemoryStateBackend, FsStateBackend, RocksDBStateBackend
>         They work nicely with savepoints, because every Flink job has a
> copy of the state.
> 
>   (2) Flink-connected:The state is outside Flink. The systems need to run
> separately, don't scale with Flink.
>        Examples: DBStateBackend
>        One advantage they have currently is that state in Flink is small,
> so checkpoints and restore are very cheap.
> 
> 
> I think we should start classifying the state backends like this.
> 
> 
> Greetings,
> Stephan
> 
> 
> On Mon, Feb 15, 2016 at 3:11 PM, Aljoscha Krettek <[email protected]>
> wrote:
> 
>> Hi,
>> sorry about not answering but I wanted to wait since I already voiced my
>> opinion on the PR.
>>
>> I think it is better to assume an already running redis because it is
>> easier to work around clashes in running redis instances (ports, data
>> directory, and such). Then, however, care needs to be taken to make sure
>> that the state inside the one redis instance does not clash.
>>
>> Cheers,
>> Aljoscha
>>> On 15 Feb 2016, at 14:53, Matthias J. Sax <[email protected]> wrote:
>>>
>>> Anyone?
>>>
>>> Otherwise, I will suggest to move forward with the PR using the
>>> assumption that Redis must be started manually.
>>>
>>> -Matthias
>>>
>>> On 02/11/2016 08:28 PM, Matthias J. Sax wrote:
>>>> Hi,
>>>>
>>>> In Flink it is possible to have different backends for operator state. I
>>>> am wondering what the best approach for different state backends would
>> be.
>>>>
>>>> Let's assume the backend is a database server. The following questions
>>>> arise:
>>>>  - Should the database server be started manually by the user or can
>>>> Flink start the server automatically it used?
>>>>    (this seems to be the approach for RocksDB as embedded servers)
>>>>  - Should each job use the same or individual backup server (or maybe a
>>>> mix of both?)
>>>>
>>>> I personally think, that Flink should not start-up a backup server but
>>>> assume that it is available when the job is submitted. This allows the
>>>> user also the start up multiple instances of the backup server and
>>>> choose which one to use for each job individually.
>>>>
>>>> What do you think about it? I ask because of the current PR for Redis as
>>>> StateBackend:
>>>> https://github.com/apache/flink/pull/1617
>>>>
>>>> There is no embedded mode for Redis as for RocksDB.
>>>>
>>>> -Matthias
>>>>
>>>
>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: StateBackend

Reply via email to