Hi Soumya, Operator state is partitioned across JVMs and not replicated. However, it is checkpointed (e.g., to HDFS) at regular intervals to guarantee fault-tolerance with exactly-once semantics. In case of a failure, all operator states are recovered from checkpoints.
The state backend is responsible for the local state management, i.e., how the state that is queried and updated by an operator. Using a state backend such as RocksDB which goes to disk for every request has of course higher latency than doing lookups/updates in a Java HashMap. What you gain is that your local state can grow much larger than the JVMs memory. Since the backend is pluggable, it would be possible to implement a backend with disk persistence and an in-mem cache. Regarding the use case of firing an event, you can implement it with a custom stream operator in Flink. Best, Fabian 2016-02-04 10:15 GMT+01:00 Soumya Simanta <soumya.sima...@gmail.com>: > Fabian, > > Thank a lot for your response. Really appreciated. I've some additional > questions (please see inline) > > On Wed, Feb 3, 2016 at 2:42 PM, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi, >> >> 1) At the moment, state is kept on the JVM heap in a regular HashMap. >> > Is this state replicated across JVMs in a cluster setup? This also implies > that on a single node Flink setup the is a chance of failure. > >> >> However, we added an interface for pluggable state backends. State >> backends store the operator state (Flink's built-in window operators are >> based on operator state as well). A pull request to add a RocksDB backend >> (going to disk) will be merged soon [1]. Another backend using Flink's >> managed memory is planned. >> > Will having a state persistence have any impact on performance? > > >> >> 2) I am not sure what you mean by trigger / schedule a delayed event, but >> have a few pointers that might be helpful: >> - Flink can handle late arriving events. Check the event-time feature [2]. >> - Flink's window triggers can be used to schedule window computations [3] >> - You can implement a custom source function that emits / triggers events. >> > My specific use case is firing an event (after a certain amount of time) > based on some data computation that I'm performing on another stream. > > >> >> Best, Fabian >> >> [1] https://github.com/apache/flink/pull/1562 >> [2] >> http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/ >> [3] http://flink.apache.org/news/2015/12/04/Introducing-windows.html >> >> 2016-02-03 5:39 GMT+01:00 Soumya Simanta <soumya.sima...@gmail.com>: >> >>> I'm getting started with Flink and had a very fundamental doubt. >>> >>> 1) Where does Flink capture/store intermediate state? >>> >>> For example, two streams of data have a common key. The streams can lag >>> in time (second, hours or even days). My understanding is that Flink >>> somehow needs to store the data from the first (faster) stream so that it >>> can match and join the data with the second(slower) stream. >>> >>> 2) Is there a mechanism to trigger/schedule a delayed event in Flink? >>> >>> Thanks >>> -Soumya >>> >>> >>> >>> >> >