I think this is actually a pretty good question. Right now, there are two different types of state backends:
(1) Flink-embedded. They are independent of external services, scale out as the Flink job scales out, and are really mainly a way of storing and backuping key/value state. For example: MemoryStateBackend, FsStateBackend, RocksDBStateBackend They work nicely with savepoints, because every Flink job has a copy of the state. (2) Flink-connected:The state is outside Flink. The systems need to run separately, don't scale with Flink. Examples: DBStateBackend One advantage they have currently is that state in Flink is small, so checkpoints and restore are very cheap. I think we should start classifying the state backends like this. Greetings, Stephan On Mon, Feb 15, 2016 at 3:11 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > sorry about not answering but I wanted to wait since I already voiced my > opinion on the PR. > > I think it is better to assume an already running redis because it is > easier to work around clashes in running redis instances (ports, data > directory, and such). Then, however, care needs to be taken to make sure > that the state inside the one redis instance does not clash. > > Cheers, > Aljoscha > > On 15 Feb 2016, at 14:53, Matthias J. Sax <mj...@apache.org> wrote: > > > > Anyone? > > > > Otherwise, I will suggest to move forward with the PR using the > > assumption that Redis must be started manually. > > > > -Matthias > > > > On 02/11/2016 08:28 PM, Matthias J. Sax wrote: > >> Hi, > >> > >> In Flink it is possible to have different backends for operator state. I > >> am wondering what the best approach for different state backends would > be. > >> > >> Let's assume the backend is a database server. The following questions > >> arise: > >> - Should the database server be started manually by the user or can > >> Flink start the server automatically it used? > >> (this seems to be the approach for RocksDB as embedded servers) > >> - Should each job use the same or individual backup server (or maybe a > >> mix of both?) > >> > >> I personally think, that Flink should not start-up a backup server but > >> assume that it is available when the job is submitted. This allows the > >> user also the start up multiple instances of the backup server and > >> choose which one to use for each job individually. > >> > >> What do you think about it? I ask because of the current PR for Redis as > >> StateBackend: > >> https://github.com/apache/flink/pull/1617 > >> > >> There is no embedded mode for Redis as for RocksDB. > >> > >> -Matthias > >> > > > >