Hi Daniel,

This would indeed greatly reduce the duplicate processing on failbacks.

Few questions:

  1.  Adding another offset sync store might be memory intensive. Would it make 
sense (if possible) to filter the topics in it based on the 
reverseCheckpointingTopicFilter?
  2.  Would it make sense to add a reverseCheckpointingGroupFilter as well, so 
that one can control not just the topics for reverse checkpointing but also the 
groups?

Do I understand this correctly, that the replication flow itself must be 
bidirectional, but the topic replication doesn’t? If so, this seems to unlock 
another use case. With this change, one can more confidently fail over the 
consumer group to the passive cluster and back (in the context of the topic 
itself), without much reprocessing; I see this useful when a cluster gets busy 
at times. Or even have a new consumer group consume messages from the passive 
cluster for a while, before “failing it over” to the active cluster. Is this 
something that you would recommend using the feature for?

Best,
Vidor


On 2024/10/25 15:31:50 Dániel Urbán wrote:
> Hi,
>
> One more update. As I was working on the PR, I realized that the only way
> to support IdentityReplicationPolicy is to add an extra topic filter to the
> checkpointing. I updated the KIP accordingly.
> I also opened a draft PR to demonstrate the proposed changes:
> https://github.com/apache/kafka/pull/17593
>
> Daniel
>
> Dániel Urbán <ur...@gmail.com<mailto:ur...@gmail.com>> ezt írta (időpont: 
> 2024. okt. 24., Cs,
> 15:22):
>
> > Hi all,
> > Just a bump/minor update here:
> > As I've started working on a POC of the proposed solution, I've realised
> > that the hard requirement related to the ReplicationPolicy implementation
> > can be eliminated, updated the KIP accordingly.
> > Daniel
> >
> > Dániel Urbán <ur...@gmail.com<mailto:ur...@gmail.com>> ezt írta (időpont: 
> > 2024. okt. 21.,
> > H, 16:18):
> >
> >> Hi Mickael,
> >> Good point, I renamed the KIP and this thread:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1098%3A+Reverse+Checkpointing+in+MirrorMaker
> >> Thank you,
> >> Daniel
> >>
> >> Mickael Maison <mi...@gmail.com<mailto:mi...@gmail.com>> ezt írta 
> >> (időpont: 2024. okt.
> >> 21., H, 15:22):
> >>
> >>> Hi Daniel,
> >>>
> >>> I've not had time to take a close look at the KIP but my initial
> >>> feedback would be to adjust the name to make it clear it's about
> >>> MirrorMaker.
> >>> The word "checkpoint" has several meanings in Kafka and from the
> >>> current KIP name it's not clear if it's about KRaft, Streams or
> >>> Connect.
> >>>
> >>> Thanks,
> >>> Mickael
> >>>
> >>> On Mon, Oct 21, 2024 at 2:55 PM Dániel Urbán 
> >>> <ur...@gmail.com<mailto:ur...@gmail.com>>
> >>> wrote:
> >>> >
> >>> > Hi Viktor,
> >>> >
> >>> > Thank you for the comments!
> >>> >
> >>> > SVV1: I think the feature has some performance implications. If the
> >>> reverse
> >>> > checkpointing is enabled, task startup will be possibly slower, since
> >>> it
> >>> > will need to consume from a second offset-syncs topic; and it will
> >>> also use
> >>> > more memory, to keep the second offset-sync history. Additionally, it
> >>> is
> >>> > also possible to have an offset-syncs topic present without an actual,
> >>> > opposite flow being active - I think only users can tell if the reverse
> >>> > checkpointing should be active, and they should be the one opting in
> >>> for
> >>> > the higher resource usage.
> >>> >
> >>> > SVV2: I mention the DefaultReplicationPolicy to provide examples. I
> >>> don't
> >>> > think it is required. The actual requirement related to the
> >>> > ReplicationPolicy is that the policy should be able to correctly tell
> >>> which
> >>> > topic was replicated from which cluster. Because of this,
> >>> > IdentityReplicationPolicy would not work, but
> >>> DefaultReplicationPolicy, or
> >>> > any other ReplicationPolicy implementations with a correctly
> >>> implemented
> >>> > "topicSource" method should work. I will make an explicit note of this
> >>> in
> >>> > the KIP.
> >>> >
> >>> > Thank you,
> >>> > Daniel
> >>> >
> >>> > Viktor Somogyi-Vass 
> >>> > <vi...@cloudera.com.inva<mailto:vi...@cloudera.com.inva>lid> ezt írta
> >>> > (időpont: 2024. okt. 18., Pén 17:28):
> >>> >
> >>> > > Hey Dan,
> >>> > >
> >>> > > I think this is a very useful idea. Two questions:
> >>> > >
> >>> > > SVV1: Do you think we need the feature flag at all? I know that not
> >>> having
> >>> > > this flag may technically render the KIP unnecessary (however it may
> >>> still
> >>> > > be useful to discuss this topic and create a concensus). As you
> >>> wrote in
> >>> > > the KIP, we may be able to look up the target and source topics and
> >>> if we
> >>> > > can do this, we can probably detect if the replication is one-way or
> >>> > > prefixless (identity). So that means we don't need this flag to
> >>> control
> >>> > > when we want to use this. Then it is really just there to act as
> >>> something
> >>> > > that can turn the feature on and off if needed, but I'm not really
> >>> sure if
> >>> > > there is a great risk in just enabling this by default. If we really
> >>> just
> >>> > > turn back the B -> A checkpoints and save them in the A -> B, then
> >>> maybe
> >>> > > it's not too risky and users would get this immediately by just
> >>> upgrading.
> >>> > >
> >>> > > SVV2: You write that we need DefaultReplicationPolicy to use this
> >>> feature,
> >>> > > but most of the functionality is there on interface level in
> >>> > > ReplicationPolicy. Is there anything that is missing from there and
> >>> if so,
> >>> > > what do you think about pulling it into the interface? If this
> >>> improvement
> >>> > > only works with the default replication policy, then it's somewhat
> >>> limiting
> >>> > > as users may have their own policy for various reasons, but if we
> >>> can make
> >>> > > it work on the interface level, then we could provide this feature to
> >>> > > everyone. Of course there can be replication policies like the
> >>> identity one
> >>> > > that by design disallows this feature, but for that, see my previous
> >>> point.
> >>> > >
> >>> > > Best,
> >>> > > Viktor
> >>> > >
> >>> > > On Fri, Oct 18, 2024 at 3:30 PM Dániel Urbán 
> >>> > > <ur...@gmail.com<mailto:ur...@gmail.com>>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi everyone,
> >>> > > >
> >>> > > > I'd like to start the discussion on KIP-1098: Reverse
> >>> Checkpointing (
> >>> > > >
> >>> > > >
> >>> > >
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1098%3A+Reverse+Checkpointing
> >>> > > > )
> >>> > > > which aims to minimize message reprocessing for consumers in
> >>> failbacks.
> >>> > > >
> >>> > > > TIA,
> >>> > > > Daniel
> >>> > > >
> >>> > >
> >>>
> >>
>

Reply via email to