Hi Yu, Very cool! I might be out of dated of what’s new in Flink already… Just wonder If there are efforts to support seconds level barrier alignment?
Chen > On Dec 27, 2018, at 23:26, Yu Li <car...@gmail.com> wrote: > > FWIW, one major advantage of adopting HBase as Flink statebackend is to > support direct read/write on DFS, so as to disaggregate storage and compute > (DisAgg). DisAgg has several benefits, such as supporting elastic > computing in cloud, much better (order of magnitude) recovery speed when > rescaling up/down (as Gyula also mentioned), etc. and we could eliminate > the performance regression compared to local RW through techniques like > adding a local L2 cache. More information please refer to our talk > <https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf> > at this year's Flink Forward China, and we could discuss more in another > thread if interested. > > Back to @Naveen's question here, we need to make HBase supporting embedded > mode first before adopting it as Flink statebackend. We have done some > initial work and please refer to HBASE-17743 > <https://issues.apache.org/jira/browse/HBASE-17743> and the design doc > there for more details. And for sure we will upstream our work when ready > to (smile). > > Best Regards, > Yu > > > On Fri, 28 Dec 2018 at 13:12, Chen Qin <qinnc...@gmail.com> wrote: > >> Hi Naveen, >> >> AFAIK, there are two level of storage in typical statebackend >> (local/remote). I think it kinda similar to what PC main memory and disk >> analogy. >> >> Take RocksDB Statebackend as example, window state (typical very large >> ListState) persisted in partitioned local rocksdb files, adding element to >> window is localized and cheap.When checkpoint starts, each of those rocksdb >> do upload to corresponding HDFS directories separately.This is good in a >> sense when any intermediate states between two successful checkpoints can >> be overwritten and local snapshots can be done cheaply and asynchronously. >> >> I heard folks tried to build mysqlbackend(deprecated), remote rocksdb as >> service backend(hard to scale and performance bottleneck) , Cassandra(hard >> to snapshot). All of which shares same trait on lack of local >> parallelizable snapshot semantic. >> >> Hope this helps! >> Chen >> >> On Thu, Dec 27, 2018 at 8:27 AM miki haiat <miko5...@gmail.com> wrote: >> >>> Did try to use rocksdb[1] as state backend? >>> >>> >>> 1. >>> >>> >> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend >>> >>> >>> On Thu, 27 Dec 2018, 18:17 Naveen Kumar <naveenkuma...@flipkart.com >>> .invalid >>> wrote: >>> >>>> Hi, >>>> >>>> I am exploring if we can plugin hbase as state backend in Flink. We >> have >>>> need for streaming jobs with large window states, high throughput and >>>> reliability. >>>> >>>> I wanted to know if implementing Flink backend in Hbase or other >>>> distributed KV store is possible. Any documentation or pointers will be >>>> helpful. >>>> >>>> Thanks, >>>> Naveen >>>> >>> >>