bq. we add the key-group to the heap format (1-2 bytes extra per key). This seems to be better choice among the two.
bq. change the heap backends to write in the same way as RocksDB +1 on above. These two combined would give users flexibility in state backend migration. On Thu, Aug 17, 2017 at 2:55 AM, Stefan Richter <s.rich...@data-artisans.com > wrote: > This is not possible out of the box. Historically, the > checkpoint/savepoint formats have been different between heap based and > RocksDB based backends. We have already eliminated most differences in 1.3. > > However, there are two problems remaining. The first problem is just how > the number of written key-value pairs is tracked: in the heap case, we have > all the counts and can serialize: count + iterate all elements. For RocksDB > this is not possible because there is no way to get the exact key count, so > the serialization iterates all elements and then writes a terminal symbol > to the stream. So what we could consider in the future is the change the > heap backends to write in the same way as RocksDB, which is a slightly less > natural approach, but much easier than trying to emulate the current heap > approach with RocksDB. The later would require us do do an iteration over > the whole state just to obtain the counts. > > The second problem is about key-groups: we keep all k/v pairs in RocksDB > in key-group order by prefixing the key bytes with the key-group id (this > is important for rescaling). When we write the elements from RocksDB to > disk, they include the prefix. For the heap backend, this prefix is not > required because (as we are in memory) we can very efficiently do the > key-group partitioning in the async part of a checkpoint/savepoint. So here > have two options, both with some pros and cons. Either we don’t write the > key-group bytes with RocksDB and recompute them on restore (this means we > have to go through serde and compute a hash) or we add the key-group to the > heap format (1-2 bytes extra per key). > > So, it is definitely possible to unify both formats completely, but those > two points need to be resolved. > > Hope this gives some more details to the discussion. > > Best, > Stefan > > > Am 17.08.2017 um 10:50 schrieb Biplob Biswas <revolutioni...@gmail.com>: > > > > I am not really sure you can do that out of the box, if not, indeed that > > should be possible in the near future. > > > > https://ci.apache.org/projects/flink/flink-docs- > release-1.3/dev/stream/state.html#handling-serializer- > upgrades-and-compatibility > > > > There are already plans for state migration (with upgraded serializers) > as I > > read here, so this could be an additional task while migrating states, > > although I am not sure how easy or difficult this could be. > > > > Also, as you can read here, > > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/Evolving-serializers-and-impact-on- > flink-managed-states-td14777.html > > > > Stefan really nicely explained what would/is happening on state > migration on > > different backends. > > > > So based on that, what I can imagine is moving from FsStateBackend to > > RocksDbStateBackend or from MemoryStateBackend to RocksDbStateBackend > would > > be easier, but not the other way round. > > > > Thanks > > > > > > > > -- > > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Change-state- > backend-tp14928p14961.html > > Sent from the Apache Flink User Mailing List archive. mailing list > archive at Nabble.com. > >