Hi Naveen, AFAIK, there are two level of storage in typical statebackend (local/remote). I think it kinda similar to what PC main memory and disk analogy.
Take RocksDB Statebackend as example, window state (typical very large ListState) persisted in partitioned local rocksdb files, adding element to window is localized and cheap.When checkpoint starts, each of those rocksdb do upload to corresponding HDFS directories separately.This is good in a sense when any intermediate states between two successful checkpoints can be overwritten and local snapshots can be done cheaply and asynchronously. I heard folks tried to build mysqlbackend(deprecated), remote rocksdb as service backend(hard to scale and performance bottleneck) , Cassandra(hard to snapshot). All of which shares same trait on lack of local parallelizable snapshot semantic. Hope this helps! Chen On Thu, Dec 27, 2018 at 8:27 AM miki haiat <miko5...@gmail.com> wrote: > Did try to use rocksdb[1] as state backend? > > > 1. > > https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend > > > On Thu, 27 Dec 2018, 18:17 Naveen Kumar <naveenkuma...@flipkart.com > .invalid > wrote: > > > Hi, > > > > I am exploring if we can plugin hbase as state backend in Flink. We have > > need for streaming jobs with large window states, high throughput and > > reliability. > > > > I wanted to know if implementing Flink backend in Hbase or other > > distributed KV store is possible. Any documentation or pointers will be > > helpful. > > > > Thanks, > > Naveen > > >