curcur edited a comment on pull request #16606:
URL: https://github.com/apache/flink/pull/16606#issuecomment-917898103


   Roman and I had several long discussions on interfaces between 
Materialization and 
[`ChangelogKeyedStatebackend`](https://github.com/apache/flink/commit/3421b81c2502f61112bd131a7336c16e3dd30925#diff-e071e8a89527c24be4ee5ee342ad7d47c870170ef915d1407d18e998f7847f16L108).
 Document here for future reference.
   
   The main difference is between who is responsible to **keep and update** 
`ChangelogKeyedStatebackend`'s related states, denoted as 
[`ChangelogSnapshotState`](https://github.com/apache/flink/commit/3421b81c2502f61112bd131a7336c16e3dd30925#diff-79beab2a7108881b64ac4b482a6446e06623efa7e19ac4b0018c7cf20c35e88aR39)
 including three parts:
     - materialized snapshot from the underlying delegated state backend
     - non-materialized part in the current changelog
     - non-materialized changelog, from previous logs (before failover or 
rescaling) 
   
   We've discussed and tried out three different versions, list as follows:
   
   1.  Materialization coupled with `ChangelogKeyedStatebackend`, implemented 
as [an inner class 
`PeriodicMaterializer`](https://github.com/apache/flink/commit/fbd1e2d38ae6353506ceac8eb074bd24bdb29b62#diff-e071e8a89527c24be4ee5ee342ad7d47c870170ef915d1407d18e998f7847f16R659)
 of `ChangelogKeyedStatebackend`, 
      This version is implemented in commit 
**fbd1e2d38ae6353506ceac8eb074bd24bdb29b62**
       - Pros: states are shared, easy to reason about
       - Cons: Coupled too closely, not flexible or extendible for 
keyedstatebackend or materializer
   
       Not to mention further, this approach is discarded during early 
discussion.
        
   2. 
[`PeriodicMaterializer`](https://github.com/apache/flink/commit/3421b81c2502f61112bd131a7336c16e3dd30925#diff-4fbeeeabb5525b7f38fb36c670f35224bcd619e482b14451f20ae08ed7a8a43cR36)
 as a separate component of `ChangelogKeyedStatebackend`. 
`ChangelogSnapshotState` are kept in `PeriodicMaterializer`. 
`PeriodicMaterializer` is conceptually taken as a way to connect the underlying 
delegated state backend to changelog. The way to connect is through 
`ChangelogSnapshotState`, 
             - materialized snapshot from the underlying delegated state backend
             - non-materialized part in the current changelog, expressed using 
(expressed as SeqNo# in the log)
             - non-materialized changelog, from previous logs (before failover 
or rescaling)
       This version is implemented in commit 
**3421b81c2502f61112bd131a7336c16e3dd30925**
   
       - Pros: 
         1. Good isolation and extensibility. Clear view the 
ChangelogKeyedStatebackend as four parts: 
                 - log writer
                 - delegated statebackend
                 - materializer, and 
                 - the top wrapper changelogkeyedstatebackend for double state 
writing
         2. More natural to understand and implement.
             - State is updated by the materializer, and accessible by 
changelogKeyedStateBackend
             - `PeriodicMaterializer` is part of `ChangelogKeyedStateBackend`
   
       - Cons: 
          1. according to Roman, `ChangelogKeyedStateBackend` has implicit 
states (like state double writes) besides the three mentioned above; 
          2. optimization (like batched writes) need to update 
`PeriodicMaterializer` as well.
          (Roman has more complement updates for this part in this comment)
   
   3. `ChangelogSnapshotState` and its updates are kept in 
ChangelogKeyedStatBackend. Materialization works as a stateless Materialization 
Manager providing function utilities.
   Implemented as commit **75dec43024d91b896d488a4c9e979d486228398a**
       - Pros:
          1. All states are wrapped in ChangelogKeyedStatBackend
          2. Conceptually also works naturally
          
       - Cons:
         Circular constructor. `Materialization Manager` needs access to 
`ChangelogKeyedStatBackend` to update `ChangelogSnapshotState`
         `ChangelogKeyedStatBackend` is created from 
StateBackend#createKeyedStateBackend. 
          
          To avoid circular construction, `Materialization Manager` has to be 
exposed at the time creating ChangelogKeyedStatBackend. 
   
   @rkhachatryan what do you think Roman?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to