curcur edited a comment on pull request #16606: URL: https://github.com/apache/flink/pull/16606#issuecomment-917898103
Roman and I had several long discussions on interfaces between `Materialization` and `ChangelogKeyedStatebackend`. Document here for future reference. The main difference is between who is responsible to **keep and update** the ChangelogKeyedStatebackend related states, denoted as `ChangelogSnapshotState`, including three parts: * - materialized snapshot from the underlying delegated state backend * - non-materialized part in the current changelog * - non-materialized changelog, from previous logs (before failover or rescaling) We've discussed and tried out three versions: 1. `Materialization` coupled with `ChangelogKeyedStatebackend`, implemented in commit **fbd1e2d38ae6353506ceac8eb074bd24bdb29b62** Where `PeriodicMaterializer` is an inner class of `ChangelogKeyedStatebackend` - Pros: states are shared, easy to reason about - Cons: Coupled too closely, not flexible or extendible for keyedstatebackend or materializer Not to mention further, this approach is discarded during early discussion. 2. `ChangelogSnapshotState` are kept in materializer. Materializer is conceptually taken as a way to connect delegated state backend to changelog. How to connect: through `ChangelogSnapshotState`, as denoted above. implemented in commit **3421b81c2502f61112bd131a7336c16e3dd30925** - Pros: 1. Good isolation and extensibility. Clear view the changelog keyedstatebackend as four parts: - log writer, delegated statebackend, materializer, and wrapper changelogkeyedstatebackend for double writing 2. More natural to understand and implement. - State is updated by the materializer, and accessible by changelogKeyedStateBackend - Materializer is part of ChangelogKeyedStateBackend - Cons: 1. according to Roman, ChangelogKeyedStateBackend has implicit states (like state double writes) besides the three mentioned above; 2. optimization (like batched writes) need to update materilizer as well 3. `ChangelogSnapshotState` and its updates are kept in ChangelogKeyedStatBackend. Materialization works as a stateless Materialization Manager providing function utilities. Implemented as commit **75dec43024d91b896d488a4c9e979d486228398a** - Pros: 1. All states are wrapped in ChangelogKeyedStatBackend 2. Conceptually also works naturally - Cons: Circular constructor. `Materialization Manager` needs access to `ChangelogKeyedStatBackend` to update `ChangelogSnapshotState` `ChangelogKeyedStatBackend` is created from StateBackend#createKeyedStateBackend. To avoid circular construction, `Materialization Manager` has to be exposed at the time creating ChangelogKeyedStatBackend. @rkhachatryan what do you think Roman? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org