Hi All,

We ever held a discussion about this feature before [1] but now opening
another thread because after a second thought introducing a new backend
instead of modifying the existing heap backend is a better option to
prevent causing any regression or surprise to existing in-production usage.
And since introducing a new backend is relatively big change, we regard it
as a FLIP and need another discussion and voting process according to our
newly drafted bylaw [2].

Please allow me to quote the brief description from the old thread [1] for
the convenience of those who noticed this feature for the first time:


*HeapKeyedStateBackend is one of the two KeyedStateBackends in Flink, since
state lives as Java objects on the heap in HeapKeyedStateBackend and the
de/serialization only happens during state snapshot and restore, it
outperforms RocksDBKeyeStateBackend when all data could reside in
memory.**However,
along with the advantage, HeapKeyedStateBackend also has its shortcomings,
and the most painful one is the difficulty to estimate the maximum heap
size (Xmx) to set, and we will suffer from GC impact once the heap memory
is not enough to hold all state data. There’re several (inevitable) causes
for such scenario, including (but not limited to):*



** Memory overhead of Java object representation (tens of times of the
serialized data size).* Data flood caused by burst traffic.* Data
accumulation caused by source malfunction.**To resolve this problem, we
proposed a solution to support spilling state data to disk before heap
memory is exhausted. We will monitor the heap usage and choose the coldest
data to spill, and reload them when heap memory is regained after data
removing or TTL expiration, automatically. Furthermore, *to prevent causing
unexpected regression to existing usage of HeapKeyedStateBackend, we plan
to introduce a new SpillableHeapKeyedStateBackend and change it to default
in future if proven to be stable.

Please let us know your point of the feature and any comment is
welcomed/appreciated. Thanks.

[1] https://s.apache.org/pxeif
[2]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026

Best Regards,
Yu

Reply via email to