If I got it correctly, part of the motivation is to move rarely used / cold state to an external storage (please correct me if I'm wrong).
2017-01-20 11:35 GMT+01:00 Stephan Ewen <se...@apache.org>: > Hi! > > This is an interesting suggestion. > Just to make sure I understand it correctly: Do you design this for cases > where the state per machine is larger than that machines memory/disk? And > in that case, you cannot solve the problem by scaling out (having more > machines)? > > Stephan > > > On Tue, Jan 17, 2017 at 6:29 AM, Chen Qin <c...@uber.com> wrote: > > > Hi there, > > > > I would like to discuss split over local states to external storage. The > > use case is NOT another external state backend like HDFS, rather just to > > expand beyond what local disk/ memory can hold when large key space > exceeds > > what task managers could handle. Realizing FLINK-4266 might be hard to > > tacking all-in-one, I would live give a shot to split-over first. > > > > An intuitive approach would be treat HeapStatebackend as LRU cache and > > split over to external key/value storage when threshold triggered. To > make > > this happen, we need minor refactor to runtime and adding set/get logic. > > One nice thing of keeping HDFS to store snapshots would be avoid > versioning > > conflicts. Once checkpoint restore happens, partial write data will be > > overwritten with previously checkpointed value. > > > > Comments? > > > > -- > > -Chen Qin > > >