Hi,

Very happy to see that the incremental checkpoint idea is finally becoming a 
reality for the heap backend! Overall the proposal looks pretty good to me. 
Just wanted to point out one possible improvement from what I can still 
remember from my ideas back then: I think you can avoid doing periodic full 
snapshots for consolidation. Instead, my suggestion would be to track the 
version numbers you encounter while you iterate a snapshot for writing it - and 
then you should be able to prune all incremental snapshots that were performed 
with a version number smaller than the minimum you find. To avoid the problem 
of very old entries that never get modified you could start spilling entries 
with a certain age-difference compared to the current map version so that 
eventually all entries for an old version are re-written to newer snapshots. 
You can track the version up to which this was done in the map and then you can 
again let go of their corresponding snapshots after a guaranteed time.So 
instead of having the burden of periodic large snapshots, you can make every 
snapshot work a little bit on the cleanup and if you are lucky it might happen 
mostly by itself if most entries are frequently updated. I would also consider 
to make map clean a special event in your log and consider unticking the 
versions on this event - this allows you to let go of old snapshots and saves 
you from writing a log of antimatter entries. Maybe the ideas are still useful 
to you.

Best,
Stefan

On 2020/11/04 01:54:25, Khachatryan Roman <k...@gmail.com> wrote: 
> Hi devs,> 
> 
> I'd like to start a discussion of FLIP-151: Incremental snapshots for> 
> heap-based state backend [1]> 
> 
> Heap backend, while being limited state sizes fitting into memory, also has> 
> some advantages compared to RocksDB backend:> 
> 1. Serialization once per checkpoint, not per state modification. This> 
> allows to “squash” updates to the same keys> 
> 2. Shorter synchronous phase (compared to RocksDB incremental)> 
> 3. No need for sorting and compaction, no IO amplification and JNI overhead> 
> This can potentially give higher throughput and efficiency.> 
> 
> However, Heap backend currently lacks incremental checkpoints. This FLIP> 
> aims to add initial support for them.> 
> 
> [1]> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-151%3A+Incremental+snapshots+for+heap-based+state+backend>
>  
> 
> 
> Any feedback highly appreciated.> 
> 
> Regards,> 
> Roman> 
> 

Reply via email to