Hi Seth,
that sounds reasonable. What I was asking for was not to reverse
engineer binary format, but to make the savepoint write API a little
more reusable, so that it could be wrapped into some other technology. I
don't know the details enough to propose a solution, but it seems to me,
that it could be possible to use something like Writer instead of
Transform. Or maybe the Transform can use the Writer internally, the
goal is just to enable to create the savepoint from "'outside" of Flink
(with some library, of course).
Jan
On 5/31/19 1:17 PM, Seth Wiesman wrote:
@Konstantin agreed, that was a large impotence for this feature. Also I am
happy to change the name to something that better describes the feature set.
@Lan
Savepoints depend heavily on a number of flink internal components that may
change between versions: state backends internals, type serializers, the
specific hash function used to turn a UID into an OperatorID, etc. I consider
it a feature of this proposal that the library depends on those internal
components instead of reverse engineering the binary format. This way as those
internals change, or new state features are added (think the recent addition of
TTL) we will get support automatically. I do not believe anything else is
maintainable.
Seth
On May 31, 2019, at 5:56 AM, Jan Lukavský <je...@seznam.cz> wrote:
Hi,
this is awesome, and really useful feature. If I might ask for one thing to
consider - would it be possible to make the Savepoint manipulation API (at
least writing the Savepoint) less dependent on other parts of Flink internals
(e.g. |KeyedStateBootstrapFunction|) and provide something more general (e.g.
some generic Writer)? Why I'm asking for that - I can totally imagine
situation, where users might want to create bootstrapped state by some other
runner (e.g. Apache Spark), and then run Apache Flink after the state has been
created. This makes even more sense in context of Apache Beam, which provides
all the necessary work to make this happen. The question is - would it be
possible to design this feature so that writing the savepoint from different
runner would be possible?
Cheers,
Jan
On 5/30/19 1:14 AM, Seth Wiesman wrote:
Hey Everyone!
Gordon and I have been discussing adding a savepoint connector to flink for
reading, writing and modifying savepoints.
This is useful for:
Analyzing state for interesting patterns
Troubleshooting or auditing jobs by checking for discrepancies in state
Bootstrapping state for new applications
Modifying savepoints such as:
Changing max parallelism
Making breaking schema changes
Correcting invalid state
We are looking forward to your feedback!
This is the FLIP:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-43%3A+Savepoint+Connector
Seth