Hi Daniel,

SVV1. Fair points about the performance impact. The next question is that
can we quantify it somehow, ie. does it scale with the number of topics to
reverse checkpoints, the offsets emitted, etc.?

I'll do one more pass on the KIP in the following days but I wanted to
reply to you with what I have so far to keep this going.

Best,
Viktor

On Fri, Oct 25, 2024 at 5:32 PM Dániel Urbán <urb.dani...@gmail.com> wrote:

> Hi,
>
> One more update. As I was working on the PR, I realized that the only way
> to support IdentityReplicationPolicy is to add an extra topic filter to the
> checkpointing. I updated the KIP accordingly.
> I also opened a draft PR to demonstrate the proposed changes:
> https://github.com/apache/kafka/pull/17593
>
> Daniel
>
> Dániel Urbán <urb.dani...@gmail.com> ezt írta (időpont: 2024. okt. 24.,
> Cs,
> 15:22):
>
> > Hi all,
> > Just a bump/minor update here:
> > As I've started working on a POC of the proposed solution, I've realised
> > that the hard requirement related to the ReplicationPolicy implementation
> > can be eliminated, updated the KIP accordingly.
> > Daniel
> >
> > Dániel Urbán <urb.dani...@gmail.com> ezt írta (időpont: 2024. okt. 21.,
> > H, 16:18):
> >
> >> Hi Mickael,
> >> Good point, I renamed the KIP and this thread:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1098%3A+Reverse+Checkpointing+in+MirrorMaker
> >> Thank you,
> >> Daniel
> >>
> >> Mickael Maison <mickael.mai...@gmail.com> ezt írta (időpont: 2024. okt.
> >> 21., H, 15:22):
> >>
> >>> Hi Daniel,
> >>>
> >>> I've not had time to take a close look at the KIP but my initial
> >>> feedback would be to adjust the name to make it clear it's about
> >>> MirrorMaker.
> >>> The word "checkpoint" has several meanings in Kafka and from the
> >>> current KIP name it's not clear if it's about KRaft, Streams or
> >>> Connect.
> >>>
> >>> Thanks,
> >>> Mickael
> >>>
> >>> On Mon, Oct 21, 2024 at 2:55 PM Dániel Urbán <urb.dani...@gmail.com>
> >>> wrote:
> >>> >
> >>> > Hi Viktor,
> >>> >
> >>> > Thank you for the comments!
> >>> >
> >>> > SVV1: I think the feature has some performance implications. If the
> >>> reverse
> >>> > checkpointing is enabled, task startup will be possibly slower, since
> >>> it
> >>> > will need to consume from a second offset-syncs topic; and it will
> >>> also use
> >>> > more memory, to keep the second offset-sync history. Additionally, it
> >>> is
> >>> > also possible to have an offset-syncs topic present without an
> actual,
> >>> > opposite flow being active - I think only users can tell if the
> reverse
> >>> > checkpointing should be active, and they should be the one opting in
> >>> for
> >>> > the higher resource usage.
> >>> >
> >>> > SVV2: I mention the DefaultReplicationPolicy to provide examples. I
> >>> don't
> >>> > think it is required. The actual requirement related to the
> >>> > ReplicationPolicy is that the policy should be able to correctly tell
> >>> which
> >>> > topic was replicated from which cluster. Because of this,
> >>> > IdentityReplicationPolicy would not work, but
> >>> DefaultReplicationPolicy, or
> >>> > any other ReplicationPolicy implementations with a correctly
> >>> implemented
> >>> > "topicSource" method should work. I will make an explicit note of
> this
> >>> in
> >>> > the KIP.
> >>> >
> >>> > Thank you,
> >>> > Daniel
> >>> >
> >>> > Viktor Somogyi-Vass <viktor.somo...@cloudera.com.invalid> ezt írta
> >>> > (időpont: 2024. okt. 18., Pén 17:28):
> >>> >
> >>> > > Hey Dan,
> >>> > >
> >>> > > I think this is a very useful idea. Two questions:
> >>> > >
> >>> > > SVV1: Do you think we need the feature flag at all? I know that not
> >>> having
> >>> > > this flag may technically render the KIP unnecessary (however it
> may
> >>> still
> >>> > > be useful to discuss this topic and create a concensus). As you
> >>> wrote in
> >>> > > the KIP, we may be able to look up the target and source topics and
> >>> if we
> >>> > > can do this, we can probably detect if the replication is one-way
> or
> >>> > > prefixless (identity). So that means we don't need this flag to
> >>> control
> >>> > > when we want to use this. Then it is really just there to act as
> >>> something
> >>> > > that can turn the feature on and off if needed, but I'm not really
> >>> sure if
> >>> > > there is a great risk in just enabling this by default. If we
> really
> >>> just
> >>> > > turn back the B -> A checkpoints and save them in the A -> B, then
> >>> maybe
> >>> > > it's not too risky and users would get this immediately by just
> >>> upgrading.
> >>> > >
> >>> > > SVV2: You write that we need DefaultReplicationPolicy to use this
> >>> feature,
> >>> > > but most of the functionality is there on interface level in
> >>> > > ReplicationPolicy. Is there anything that is missing from there and
> >>> if so,
> >>> > > what do you think about pulling it into the interface? If this
> >>> improvement
> >>> > > only works with the default replication policy, then it's somewhat
> >>> limiting
> >>> > > as users may have their own policy for various reasons, but if we
> >>> can make
> >>> > > it work on the interface level, then we could provide this feature
> to
> >>> > > everyone. Of course there can be replication policies like the
> >>> identity one
> >>> > > that by design disallows this feature, but for that, see my
> previous
> >>> point.
> >>> > >
> >>> > > Best,
> >>> > > Viktor
> >>> > >
> >>> > > On Fri, Oct 18, 2024 at 3:30 PM Dániel Urbán <
> urb.dani...@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi everyone,
> >>> > > >
> >>> > > > I'd like to start the discussion on KIP-1098: Reverse
> >>> Checkpointing (
> >>> > > >
> >>> > > >
> >>> > >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1098%3A+Reverse+Checkpointing
> >>> > > > )
> >>> > > > which aims to minimize message reprocessing for consumers in
> >>> failbacks.
> >>> > > >
> >>> > > > TIA,
> >>> > > > Daniel
> >>> > > >
> >>> > >
> >>>
> >>
>

Reply via email to