The version in the state is the serializer version, and applies to the entire state, independent of what it contains. If you use Kryo2 for reading and Kryo5 for writing (which also implies writing the new serializer version into state), then I'd assume that a migration is an all-or-nothing kind of deal. IOW, you'd have to load a savepoint and write out an entirely new savepoint with the new state. Otherwise you may have only re-written part of the checkpoint, and now contains a mix of Kryo2/Kryo5 serialized classes, which should then fail _hard_ on any recovery attempt because we wouldn't use Kryo2 to read anything.

If I'm right, then as is this sounds like quite a trap for users to fall into because from what I gathered this is the default behavior in the PR (I could be wrong though since I haven't fully dug through the 8k lines PR yet...)

What we kind of want is this:
1) Kryo5 is used as the default for new jobs. (maybe not even that, making it an explicit opt-in)
2) Kryo2 is used for reading AND writing for existing* jobs by default.
3) Users can explicitly (and easily!) do a full migration of their jobs, after which 2) should no longer apply.



In the PR you mentioned running into issues on Java 17; to have have some error stacktraces and examples data/serializers still around?

On 30/05/2023 00:38, Kurt Ostfeld wrote:
I’d assumed that there wasn’t a good way to migrate state stored with an older 
version of Kryo to a newer version - if you’ve solved that, kudos.
I hope I've solved this. The pull request is supposed to do exactly this. 
Please let me know if you can propose a scenario that would break this.

The pull-request has both Kryo 2.x and 5.x dependencies. It looks at the state 
version number written to the state to determine which version of Kryo to use 
for deserialization. Kryo 2.x is not used to write new state.

------- Original Message -------
On Monday, May 29th, 2023 at 5:17 PM, Ken Krugler <kkrugler_li...@transpac.com> 
wrote:



Hi Kurt,

I personally think it’s a very nice improvement, and that the longer-term goal 
of removing built-in Kryo support for state serialization (while a good one) 
warrants a separate FLIP.

Perhaps an intermediate approach would be to disable the use of Kryo for state 
serialization by default, and force a user to disregard warnings and explicitly 
enable it if they want to go down that path.

I’d assumed that there wasn’t a good way to migrate state stored with an older 
version of Kryo to a newer version - if you’ve solved that, kudos.

— Ken


On May 29, 2023, at 2:21 PM, Kurt Ostfeld kurtostf...@proton.me.INVALID wrote:

Hi everyone. I would like to start the discussion thread for FLIP-317: Upgrade 
Kryo from 2.24.0 to 5.5.0 [1].

There is a pull-request associated with this linked in the FLIP.

I'd particularly like to hear about:

- Chesnay Schepler's request to consider removing Kryo serializers from the 
execution config. Is this a reasonable task to add into this FLIP? Is there 
consensus on how to resolve that? Would that be better addressed in a separate 
future FLIP after the Kryo upgrade FLIP is completed?

- Backwards compatibility. The automated CI tests have a lot of backwards 
compatibility tests that are passing. I also wrote a Flink application with 
keyed state using custom Kryo v2 serializers and then an upgraded version with 
both Kryo v2 and Kryo v5 serializers to stress test the upgrade process. I'd 
like to hear about additional scenarios that need to be tested.

- Is this worth pursuing or is the Flink project looking to go in a different 
direction? I'd like to do some more work on the pull request if this is being 
seriously considered for adoption.

I'm looking forward to hearing everyone's feedback and suggestions.

Thank you,
Kurt

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-317%3A+Upgrade+Kryo+from+2.24.0+to+5.5.0

--------------------------
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch


Reply via email to