I think this is actually a pretty good question. Right now, there are two
different types of state backends:

  (1) Flink-embedded. They are independent of external services, scale out
as the Flink job scales out, and are really mainly a way of storing and
backuping key/value state.
        For example: MemoryStateBackend, FsStateBackend, RocksDBStateBackend
        They work nicely with savepoints, because every Flink job has a
copy of the state.

  (2) Flink-connected:The state is outside Flink. The systems need to run
separately, don't scale with Flink.
       Examples: DBStateBackend
       One advantage they have currently is that state in Flink is small,
so checkpoints and restore are very cheap.


I think we should start classifying the state backends like this.


Greetings,
Stephan


On Mon, Feb 15, 2016 at 3:11 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
> sorry about not answering but I wanted to wait since I already voiced my
> opinion on the PR.
>
> I think it is better to assume an already running redis because it is
> easier to work around clashes in running redis instances (ports, data
> directory, and such). Then, however, care needs to be taken to make sure
> that the state inside the one redis instance does not clash.
>
> Cheers,
> Aljoscha
> > On 15 Feb 2016, at 14:53, Matthias J. Sax <mj...@apache.org> wrote:
> >
> > Anyone?
> >
> > Otherwise, I will suggest to move forward with the PR using the
> > assumption that Redis must be started manually.
> >
> > -Matthias
> >
> > On 02/11/2016 08:28 PM, Matthias J. Sax wrote:
> >> Hi,
> >>
> >> In Flink it is possible to have different backends for operator state. I
> >> am wondering what the best approach for different state backends would
> be.
> >>
> >> Let's assume the backend is a database server. The following questions
> >> arise:
> >>  - Should the database server be started manually by the user or can
> >> Flink start the server automatically it used?
> >>    (this seems to be the approach for RocksDB as embedded servers)
> >>  - Should each job use the same or individual backup server (or maybe a
> >> mix of both?)
> >>
> >> I personally think, that Flink should not start-up a backup server but
> >> assume that it is available when the job is submitted. This allows the
> >> user also the start up multiple instances of the backup server and
> >> choose which one to use for each job individually.
> >>
> >> What do you think about it? I ask because of the current PR for Redis as
> >> StateBackend:
> >> https://github.com/apache/flink/pull/1617
> >>
> >> There is no embedded mode for Redis as for RocksDB.
> >>
> >> -Matthias
> >>
> >
>
>

Reply via email to