Hi Yu, Sorry for the late reaction. As already discussed internally, I think this is a very good proposal and design that can help to improve a major limitation of the current state backend. I think that most discussion is happening in the design doc and I left my comments there. Looking forward to seeing this integrated with Flink soon!
Best, Stefan > On 24. May 2019, at 14:50, Yu Li <car...@gmail.com> wrote: > > Hi All, > > As mentioned in our speak[1] given in FlinkForwardChina2018, we have improved > HeapKeyedStateBackend to support disk spilling and put it in production here > in Alibaba for last year's Singles' Day. Now we're ready to upstream our work > and the design doc is up for review[2]. Please let us know your point of the > feature and any comment is welcomed/appreciated. > > We plan to keep the discussion open for at least 72 hours, and will create > umbrella jira and subtasks if no objections. Thanks. > > Below is a brief description about the motivation of the work, FYI: > > HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, since > state lives as Java objects on the heap in HeapKeyedStateBackend and the > de/serialization only happens during state snapshot and restore, it > outperforms RocksDBKeyeStateBackend when all data could reside in memory. > However, along with the advantage, HeapKeyedStateBackend also has its > shortcomings, and the most painful one is the difficulty to estimate the > maximum heap size (Xmx) to set, and we will suffer from GC impact once the > heap memory is not enough to hold all state data. There’re several > (inevitable) causes for such scenario, including (but not limited to): > * Memory overhead of Java object representation (tens of times of the > serialized data size). > * Data flood caused by burst traffic. > * Data accumulation caused by source malfunction. > To resolve this problem, we proposed a solution to support spilling state > data to disk before heap memory is exhausted. We will monitor the heap usage > and choose the coldest data to spill, and reload them when heap memory is > regained after data removing or TTL expiration, automatically. > > [1] https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf > <https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf> > [2] > https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing > > <https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing> > Best Regards, > Yu