Hi Stefan,

Thanks for your reply. Very interesting ideas!
If I understand correctly, SharedStateRegistry will still be responsible
for pruning the old state; for that, it will maintain some (ordered)
mapping between StateMaps and their versions, per key group.
I think one modification to this approach is needed to support journaling:
for each entry, maintain a version when it was last fully snapshotted; and
use this version to find the minimum as you described above.
I'm considering a better state cleanup and optimization of removals as the
next step. Anyway, I will add it to the FLIP document.

Thanks!

Regards,
Roman


On Tue, Nov 10, 2020 at 12:04 AM Stefan Richter <stefanrichte...@gmail.com>
wrote:

> Hi,
>
> Very happy to see that the incremental checkpoint idea is finally becoming
> a reality for the heap backend! Overall the proposal looks pretty good to
> me. Just wanted to point out one possible improvement from what I can still
> remember from my ideas back then: I think you can avoid doing periodic full
> snapshots for consolidation. Instead, my suggestion would be to track the
> version numbers you encounter while you iterate a snapshot for writing it -
> and then you should be able to prune all incremental snapshots that were
> performed with a version number smaller than the minimum you find. To avoid
> the problem of very old entries that never get modified you could start
> spilling entries with a certain age-difference compared to the current map
> version so that eventually all entries for an old version are re-written to
> newer snapshots. You can track the version up to which this was done in the
> map and then you can again let go of their corresponding snapshots after a
> guaranteed time.So instead of having the burden of periodic large
> snapshots, you can make every snapshot work a little bit on the cleanup and
> if you are lucky it might happen mostly by itself if most entries are
> frequently updated. I would also consider to make map clean a special event
> in your log and consider unticking the versions on this event - this allows
> you to let go of old snapshots and saves you from writing a log of
> antimatter entries. Maybe the ideas are still useful to you.
>
> Best,
> Stefan
>
> On 2020/11/04 01:54:25, Khachatryan Roman <k...@gmail.com> wrote:
> > Hi devs,>
> >
> > I'd like to start a discussion of FLIP-151: Incremental snapshots for>
> > heap-based state backend [1]>
> >
> > Heap backend, while being limited state sizes fitting into memory, also
> has>
> > some advantages compared to RocksDB backend:>
> > 1. Serialization once per checkpoint, not per state modification. This>
> > allows to “squash” updates to the same keys>
> > 2. Shorter synchronous phase (compared to RocksDB incremental)>
> > 3. No need for sorting and compaction, no IO amplification and JNI
> overhead>
> > This can potentially give higher throughput and efficiency.>
> >
> > However, Heap backend currently lacks incremental checkpoints. This
> FLIP>
> > aims to add initial support for them.>
> >
> > [1]>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-151%3A+Incremental+snapshots+for+heap-based+state+backend>
>
> >
> >
> > Any feedback highly appreciated.>
> >
> > Regards,>
> > Roman>
> >

Reply via email to