curcur edited a comment on pull request #16606:
URL: https://github.com/apache/flink/pull/16606#issuecomment-917898103


   Roman and I had several long discussions on interfaces between 
`Materialization` and `ChangelogKeyedStatebackend`. Document here for future 
reference.
   
   The main difference is between who is responsible to **keep and update** the 
ChangelogKeyedStatebackend related states, denoted as `ChangelogSnapshotState`, 
including three parts:
   
    * - materialized snapshot from the underlying delegated state backend
    * - non-materialized part in the current changelog
    * - non-materialized changelog, from previous logs (before failover or 
rescaling) 
   
   We've discussed and tried out three versions:
   
   1. `Materialization` coupled with `ChangelogKeyedStatebackend`, 
   implemented in commit **fbd1e2d38ae6353506ceac8eb074bd24bdb29b62**
        Where `PeriodicMaterializer` is an inner class of 
`ChangelogKeyedStatebackend`
        - Pros: states are shared, easy to reason about
        - Cons: Coupled too closely, not flexible or extendible for 
keyedstatebackend or materializer
   
        Not to mention further, this approach is discarded during early 
discussion.
        
   2. `ChangelogSnapshotState` are kept in materializer. Materializer is 
conceptually taken as a way to connect delegated state backend to changelog. 
How to connect: through `ChangelogSnapshotState`, as denoted above.
   implemented in commit **3421b81c2502f61112bd131a7336c16e3dd30925**
   
       - Pros: 
         1. Good isolation and extensibility. Clear view the changelog 
keyedstatebackend as four parts: 
            - log writer, delegated statebackend, materializer, and wrapper 
changelogkeyedstatebackend for double writing
         2. More natural to understand and implement.
            - State is updated by the materializer, and accessible by 
changelogKeyedStateBackend
            - Materializer is part of ChangelogKeyedStateBackend
   
       - Cons: 
          1. according to Roman, ChangelogKeyedStateBackend has implicit states 
(like state double writes) besides the three mentioned above; 
          2. optimization (like batched writes) need to update materilizer as 
well
   
   3. `ChangelogSnapshotState` and its updates are kept in 
ChangelogKeyedStatBackend. Materialization works as a stateless Materialization 
Manager providing function utilities.
   Implemented as commit **75dec43024d91b896d488a4c9e979d486228398a**
       - Pros:
          1. All states are wrapped in ChangelogKeyedStatBackend
          2. Conceptually also works naturally
          
       - Cons:
         Circular constructor. `Materialization Manager` needs access to 
`ChangelogKeyedStatBackend` to update `ChangelogSnapshotState`
         `ChangelogKeyedStatBackend` is created from 
StateBackend#createKeyedStateBackend. 
          
          To avoid circular construction, `Materialization Manager` has to be 
exposed at the time creating ChangelogKeyedStatBackend. 
   
   @rkhachatryan what do you think Roman?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to