Hi Yu,

Sorry for the late reaction. As already discussed internally, I think this is a 
very good proposal and design that can help to improve a major limitation of 
the current state backend. I think that most discussion is happening in the 
design doc and I left my comments there. Looking forward to seeing this 
integrated with Flink soon!

Best,
Stefan

> On 24. May 2019, at 14:50, Yu Li <car...@gmail.com> wrote:
> 
> Hi All,
> 
> As mentioned in our speak[1] given in FlinkForwardChina2018, we have improved 
> HeapKeyedStateBackend to support disk spilling and put it in production here 
> in Alibaba for last year's Singles' Day. Now we're ready to upstream our work 
> and the design doc is up for review[2]. Please let us know your point of the 
> feature and any comment is welcomed/appreciated.
> 
> We plan to keep the discussion open for at least 72 hours, and will create 
> umbrella jira and subtasks if no objections. Thanks.
> 
> Below is a brief description about the motivation of the work, FYI:
> 
> HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, since 
> state lives as Java objects on the heap in HeapKeyedStateBackend and the 
> de/serialization only happens during state snapshot and restore, it 
> outperforms RocksDBKeyeStateBackend when all data could reside in memory.
> However, along with the advantage, HeapKeyedStateBackend also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we proposed a solution to support spilling state 
> data to disk before heap memory is exhausted. We will monitor the heap usage 
> and choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> 
> [1] https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf 
> <https://files.alicdn.com/tpsservice/1df9ccb8a7b6b2782a558d3c32d40c19.pdf>
> [2] 
> https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing
>  
> <https://docs.google.com/document/d/1rtWQjIQ-tYWt0lTkZYdqTM6gQUleV8AXrfTOyWUZMf4/edit?usp=sharing>
> Best Regards,
> Yu

Reply via email to