tzulitai opened a new pull request #7759: [FLINK-11485][FLINK-10897] POJO state 
schema evolution / migrate PojoSerializer to use new compatibility APIs
URL: https://github.com/apache/flink/pull/7759
 
 
   ## What is the purpose of the change
   
   This PR mainly solves FLINK-11485 (migrate `PojoSerializer` to use new 
serialization compatibility APIs), while also solves FLINK-10897 (support POJO 
state schema evolution) indirectly.
   
   The new snapshot class for the `PojoSerializer` is now the 
`PojoSerializerSnapshot`.
   This new snapshot class has the following features:
   - Avoids Java serialization completely, including serialization of POJO 
fields and POJO type class. This allows us to restore the snapshot properly 
when fields are removed or added to the POJO type class.
   - Properly supports schema evolution of POJO types, by properly signaling 
`compatibleAfterMigration` when fields are added or removed from the target 
POJO type. When performing migration, the old POJO serializer obtained from 
`PojoSerializerSnapshot#restoreSerializer`, which is used to read old data into 
Java objects before writing them again with the new POJO serializer, is capable 
of reading and dropping values of fields that no longer exist.
   - Properly reconfigures the `PojoSerializer` by providing a new reconfigured 
instance via `compatibleWithReconfiguredSerializer`. This allows the 
implementation of the `PojoSerializer` to be immutable.
   
   Please see below for the detailed list of changes.
   
   ## Brief change log
   
   - 3bd3f73 to 843088c: Cherry-picking / refactoring of the `OptionalMap` 
utility introduced by @igalshilman in #7496. This is refactored because it will 
be useful for the `PojoSerializerSnapshot` as well.
   
   - 1ebaf91: Refactor the logic of resolving overall compatibility results 
across multiple nested serializers of a composite serializer out of the 
`CompositeTypeSerializerSnapshot` class. The `PojoSerializer` also has nested 
serializers (e.g. field serializer, registered subclass serializers), and will 
make use of this common utility.
   
   - a84c54a: Introduces a new `PojoSerializerSnapshotData` class. This is a 
container class for the actual snapshotted content of a `PojoSerializer`. The 
important bits here is reading / writing of the snapshot content, as well as 
how missing fields / classes in the snapshot content at restore time is 
handled. Please see the class for the serialization format and content of the 
snapshotted data.
   
   - e77e828: Introduces a new `PojoSerializerSnapshot` class, to be the new 
snapshot class for the `PojoSerializer`. The read / write of the snapshot is 
delegated to the `PojoSerializerSnapshotData`, so the important part here is 
the compatibility resolution logic in `resolveSchemaCompatibility`, as well as 
creation of the restore serializer in `restoreSerializer` method. The restored 
serializer should be able to handle missing fields in the new Pojo type; values 
of missing fields will be read and simply dropped.
   
   - 44bfc4b: This is the main change that lets the `PojoSerializer` use the 
new snapshot class, and no longer uses the now deprecated 
`PojoSerializerConfigSnapshot`. This also deals with how the 
`PojoSerializerConfigSnapshot` delegates compatibility checks to the new 
snapshot class, when restoring from savepoint version earlier than 1.8.0. Since 
the new snapshot class properly handles POJO schema changes, this commit 
essentially also enables POJO state schema evolution.
   
   - 0941b45: Touches `PojoSerializerTest` and `PojoSerializerUpgradeTest`. 
Changes in the `PojoSerializerTest` confirms that the new `PojoSerializer` 
actually handles reconfiguration cases properly. Changes in the 
`PojoSerializerUpgradeTest` is mostly removing the behaviour of expecting 
errors when the schema of a POJO type is changed. Removing those test 
behaviours confirms that POJO schema can indeed be evolved now.
   
   - 49d9bce: Documents POJO schema evolution in the "Working with State" docs.
   
   ## Verifying this change
   
   - Changes to `PojoSerializerTest` confirms that the new `PojoSerializer` 
handles compatibility resolution properly.
   - Changes to `PojoSerializerUpgradeTest` confirms that POJO fields can be 
removed / added when restoring from savepoints.
   - A new `PojoSerializerSnapshotMigrationTest` confirms that old savepoints 
with the legacy `PojoSerializerConfigSnapshot` can be smoothly migrated to the 
new `PojoSerializerSnapshot` in 1.8.0.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency):  **no**
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
     - The serializers: **yes**
     - The runtime per-record code paths (performance sensitive): **no**
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **yes**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **yes - POJO state 
schema evolution**
     - If yes, how is the feature documented? **docs**
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to