Hi! This is an interesting suggestion. Just to make sure I understand it correctly: Do you design this for cases where the state per machine is larger than that machines memory/disk? And in that case, you cannot solve the problem by scaling out (having more machines)?
Stephan On Tue, Jan 17, 2017 at 6:29 AM, Chen Qin <c...@uber.com> wrote: > Hi there, > > I would like to discuss split over local states to external storage. The > use case is NOT another external state backend like HDFS, rather just to > expand beyond what local disk/ memory can hold when large key space exceeds > what task managers could handle. Realizing FLINK-4266 might be hard to > tacking all-in-one, I would live give a shot to split-over first. > > An intuitive approach would be treat HeapStatebackend as LRU cache and > split over to external key/value storage when threshold triggered. To make > this happen, we need minor refactor to runtime and adding set/get logic. > One nice thing of keeping HDFS to store snapshots would be avoid versioning > conflicts. Once checkpoint restore happens, partial write data will be > overwritten with previously checkpointed value. > > Comments? > > -- > -Chen Qin >