[jira] [Commented] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

Greg Harris (Jira) Mon, 29 Apr 2024 11:14:03 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842121#comment-17842121
 ]


Greg Harris commented on KAFKA-15905:
-------------------------------------

Rather than re-reading the checkpoints topic, we could add source offsets to 
the emitted records, effectively writing each checkpoint twice. The framework 
could manage the offset consumer on our behalf, simplifying the mirror 
checkpoint task and preventing the need for credentials to consume from the 
checkpoints topic.

> Restarts of MirrorCheckpointTask should not permanently interrupt offset 
> translation
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15905
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15905
>             Project: Kafka
>          Issue Type: Improvement
>          Components: mirrormaker
>    Affects Versions: 3.6.0
>            Reporter: Greg Harris
>            Priority: Major
>
> Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
> of checkpointsPerConsumerGroup, which limits offset translation to records 
> mirrored after the latest restart.
> For example, if 1000 records are mirrored and the OffsetSyncs are read by 
> MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
> happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
> records are mirrored, translation can happen at the ~1500th record, but no 
> longer at the ~500th record.
> Context:
> Before KAFKA-13659, MM2 made translation decisions based on the 
> incompletely-initialized OffsetSyncStore, and the checkpoint could appear to 
> go backwards temporarily during restarts. To fix this, we forced the 
> OffsetSyncStore to initialize completely before translation could take place, 
> ensuring that the latest OffsetSync had been read, and thus providing the 
> most accurate translation.
> Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
> Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
> allow for translation of earlier offsets. This came with the caveat that the 
> cache's sparseness allowed translations to go backwards permanently. To 
> prevent this behavior, a cache of the latest Checkpoints was kept in the 
> MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
> translation remained restricted to the fully-initialized OffsetSyncStore.
> Effectively, the MirrorCheckpointTask ensures that it translates based on an 
> OffsetSync emitted during it's lifetime, to ensure that no previous 
> MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
> emitted by previous generations of MirrorCheckpointTask, we can still ensure 
> that checkpoints are monotonic, while allowing translation further back in 
> history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

Reply via email to