bq. we add the key-group to the heap format (1-2 bytes extra per key).

This seems to be better choice among the two.

bq. change the heap backends to write in the same way as RocksDB

+1 on above.

These two combined would give users flexibility in state backend migration.

On Thu, Aug 17, 2017 at 2:55 AM, Stefan Richter <s.rich...@data-artisans.com
> wrote:

> This is not possible out of the box. Historically, the
> checkpoint/savepoint formats have been different between heap based and
> RocksDB based backends. We have already eliminated most differences in 1.3.
>
> However, there are two problems remaining. The first problem is just how
> the number of written key-value pairs is tracked: in the heap case, we have
> all the counts and can serialize: count + iterate all elements. For RocksDB
> this is not possible because there is no way to get the exact key count, so
> the serialization iterates all elements and then writes a terminal symbol
> to the stream. So what we could consider in the future is the change the
> heap backends to write in the same way as RocksDB, which is a slightly less
> natural approach, but much easier than trying to emulate the current heap
> approach with RocksDB. The later would require us do do an iteration over
> the whole state just to obtain the counts.
>
> The second problem is about key-groups: we keep all k/v pairs in RocksDB
> in key-group order by prefixing the key bytes with the key-group id (this
> is important for rescaling). When we write the elements from RocksDB to
> disk, they include the prefix. For the heap backend, this prefix is not
> required because (as we are in memory) we can very efficiently do the
> key-group partitioning in the async part of a checkpoint/savepoint. So here
> have two options, both with some pros and cons. Either we don’t write the
> key-group bytes with RocksDB and recompute them on restore (this means we
> have to go through serde and compute a hash) or we add the key-group to the
> heap format (1-2 bytes extra per key).
>
> So, it is definitely possible to unify both formats completely, but those
> two points need to be resolved.
>
> Hope this gives some more details to the discussion.
>
> Best,
> Stefan
>
> > Am 17.08.2017 um 10:50 schrieb Biplob Biswas <revolutioni...@gmail.com>:
> >
> > I am not really sure you can do that out of the box, if not, indeed that
> > should be possible in the near future.
> >
> > https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/dev/stream/state.html#handling-serializer-
> upgrades-and-compatibility
> >
> > There are already plans for state migration (with upgraded serializers)
> as I
> > read here, so this could be an additional task while migrating states,
> > although I am not sure how easy or difficult this could be.
> >
> > Also, as you can read here,
> > http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Evolving-serializers-and-impact-on-
> flink-managed-states-td14777.html
> >
> > Stefan really nicely explained what would/is happening on state
> migration on
> > different backends.
> >
> > So based on that, what I can imagine is moving from FsStateBackend to
> > RocksDbStateBackend or from MemoryStateBackend to RocksDbStateBackend
> would
> > be easier, but not the other way round.
> >
> > Thanks
> >
> >
> >
> > --
> > View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Change-state-
> backend-tp14928p14961.html
> > Sent from the Apache Flink User Mailing List archive. mailing list
> archive at Nabble.com.
>
>

Reply via email to