Hi Wolfe,

that's all correct. Thank you!

I'd like to emphasize that the FsStateBackend stores all state on the heap
of the worker JVM. So you might run into OutOfMemoryErrors if you state
grows too large.
Therefore, the RocksDBStatebackend is the recommended choice for most
production use cases.

Best, Fabian

2017-04-07 16:34 GMT+02:00 Brian Wolfe <wolfe...@gmail.com>:

> Hi Kant,
>
> Jumping in here, would love corrections if I'm wrong about any of this.
>
> In short answer, no, HDFS is not necessary to run stateful stream
> processing. In the minimal case, you can use the MemoryStateBackend to back
> up your state onto the JobManager.
>
> In any production scenario, you will want more durability for your
> checkpoints and larger state size. To do this, you should use either
> RocksDBStateBackend or FsStateBackend. Assuming you want one of these, you
> will need a checkpoint directory on a filesystem that is accessible by all
> TaskManagers. The filesystem for this checkpointing directory
> (state.backend.*.checkpointdir) can be a shared drive or anything
> supported by the Hadoop file backend see:
> *https://hadoop.apache.org/docs/stable/index.html
> <https://hadoop.apache.org/docs/stable/index.html>*
> under Hadoop Compatible File Systems for other alternatives (S3, for
> example).
>
> Choosing RocksDBStateBackend vs. FsStateBackend is a different decision.
> FsStateBackend stores in-flight state in memory and writes it to your
> durable filesystem only when checkpoints are initiated. The
> RocksDBStateBackend stores in-flight data on local disk (in RocksDB)
> instead of in-memory. When checkpoints are initiated, the appropriate state
> is then written to the durable filesystem. Because it stores state on disk,
> RocksDBStateBackend can handle much larger state than FsStateBackend on
> equivalent hardware.
>
> I'm drawing most of this from this page:
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/
> ops/state_backends.html
>
> Does that make sense?
>
> Cheers,
> Wolfe
>
> ~
> Brian Wolfe
>
>
> On Fri, Apr 7, 2017 at 2:32 AM, kant kodali <kanth...@gmail.com> wrote:
>
>> Hi All,
>>
>> I read the docs however I still have the following question For Stateful
>> stream processing is HDFS mandatory? because In some places I see it is
>> required and other places I see that rocksDB can be used. I just want to
>> know if HDFS is mandatory for Stateful stream processing?
>>
>> Thanks!
>>
>
>
  • Hi kant kodali
    • Re: Hi Brian Wolfe
      • Re: Hi Fabian Hueske

Reply via email to