The version in the state is the serializer version, and applies to the
entire state, independent of what it contains.
If you use Kryo2 for reading and Kryo5 for writing (which also implies
writing the new serializer version into state), then I'd assume that a
migration is an all-or-nothing kind of deal.
IOW, you'd have to load a savepoint and write out an entirely new
savepoint with the new state.
Otherwise you may have only re-written part of the checkpoint, and now
contains a mix of Kryo2/Kryo5 serialized classes, which should then fail
_hard_ on any recovery attempt because we wouldn't use Kryo2 to read
anything.
If I'm right, then as is this sounds like quite a trap for users to fall
into because from what I gathered this is the default behavior in the PR
(I could be wrong though since I haven't fully dug through the 8k lines
PR yet...)
What we kind of want is this:
1) Kryo5 is used as the default for new jobs. (maybe not even that,
making it an explicit opt-in)
2) Kryo2 is used for reading AND writing for existing* jobs by default.
3) Users can explicitly (and easily!) do a full migration of their jobs,
after which 2) should no longer apply.
In the PR you mentioned running into issues on Java 17; to have have
some error stacktraces and examples data/serializers still around?
On 30/05/2023 00:38, Kurt Ostfeld wrote:
I’d assumed that there wasn’t a good way to migrate state stored with an older
version of Kryo to a newer version - if you’ve solved that, kudos.
I hope I've solved this. The pull request is supposed to do exactly this.
Please let me know if you can propose a scenario that would break this.
The pull-request has both Kryo 2.x and 5.x dependencies. It looks at the state
version number written to the state to determine which version of Kryo to use
for deserialization. Kryo 2.x is not used to write new state.
------- Original Message -------
On Monday, May 29th, 2023 at 5:17 PM, Ken Krugler <kkrugler_li...@transpac.com>
wrote:
Hi Kurt,
I personally think it’s a very nice improvement, and that the longer-term goal
of removing built-in Kryo support for state serialization (while a good one)
warrants a separate FLIP.
Perhaps an intermediate approach would be to disable the use of Kryo for state
serialization by default, and force a user to disregard warnings and explicitly
enable it if they want to go down that path.
I’d assumed that there wasn’t a good way to migrate state stored with an older
version of Kryo to a newer version - if you’ve solved that, kudos.
— Ken
On May 29, 2023, at 2:21 PM, Kurt Ostfeld kurtostf...@proton.me.INVALID wrote:
Hi everyone. I would like to start the discussion thread for FLIP-317: Upgrade
Kryo from 2.24.0 to 5.5.0 [1].
There is a pull-request associated with this linked in the FLIP.
I'd particularly like to hear about:
- Chesnay Schepler's request to consider removing Kryo serializers from the
execution config. Is this a reasonable task to add into this FLIP? Is there
consensus on how to resolve that? Would that be better addressed in a separate
future FLIP after the Kryo upgrade FLIP is completed?
- Backwards compatibility. The automated CI tests have a lot of backwards
compatibility tests that are passing. I also wrote a Flink application with
keyed state using custom Kryo v2 serializers and then an upgraded version with
both Kryo v2 and Kryo v5 serializers to stress test the upgrade process. I'd
like to hear about additional scenarios that need to be tested.
- Is this worth pursuing or is the Flink project looking to go in a different
direction? I'd like to do some more work on the pull request if this is being
seriously considered for adoption.
I'm looking forward to hearing everyone's feedback and suggestions.
Thank you,
Kurt
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-317%3A+Upgrade+Kryo+from+2.24.0+to+5.5.0
--------------------------
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch