Hi, One more update. As I was working on the PR, I realized that the only way to support IdentityReplicationPolicy is to add an extra topic filter to the checkpointing. I updated the KIP accordingly. I also opened a draft PR to demonstrate the proposed changes: https://github.com/apache/kafka/pull/17593
Daniel Dániel Urbán <urb.dani...@gmail.com> ezt írta (időpont: 2024. okt. 24., Cs, 15:22): > Hi all, > Just a bump/minor update here: > As I've started working on a POC of the proposed solution, I've realised > that the hard requirement related to the ReplicationPolicy implementation > can be eliminated, updated the KIP accordingly. > Daniel > > Dániel Urbán <urb.dani...@gmail.com> ezt írta (időpont: 2024. okt. 21., > H, 16:18): > >> Hi Mickael, >> Good point, I renamed the KIP and this thread: >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1098%3A+Reverse+Checkpointing+in+MirrorMaker >> Thank you, >> Daniel >> >> Mickael Maison <mickael.mai...@gmail.com> ezt írta (időpont: 2024. okt. >> 21., H, 15:22): >> >>> Hi Daniel, >>> >>> I've not had time to take a close look at the KIP but my initial >>> feedback would be to adjust the name to make it clear it's about >>> MirrorMaker. >>> The word "checkpoint" has several meanings in Kafka and from the >>> current KIP name it's not clear if it's about KRaft, Streams or >>> Connect. >>> >>> Thanks, >>> Mickael >>> >>> On Mon, Oct 21, 2024 at 2:55 PM Dániel Urbán <urb.dani...@gmail.com> >>> wrote: >>> > >>> > Hi Viktor, >>> > >>> > Thank you for the comments! >>> > >>> > SVV1: I think the feature has some performance implications. If the >>> reverse >>> > checkpointing is enabled, task startup will be possibly slower, since >>> it >>> > will need to consume from a second offset-syncs topic; and it will >>> also use >>> > more memory, to keep the second offset-sync history. Additionally, it >>> is >>> > also possible to have an offset-syncs topic present without an actual, >>> > opposite flow being active - I think only users can tell if the reverse >>> > checkpointing should be active, and they should be the one opting in >>> for >>> > the higher resource usage. >>> > >>> > SVV2: I mention the DefaultReplicationPolicy to provide examples. I >>> don't >>> > think it is required. The actual requirement related to the >>> > ReplicationPolicy is that the policy should be able to correctly tell >>> which >>> > topic was replicated from which cluster. Because of this, >>> > IdentityReplicationPolicy would not work, but >>> DefaultReplicationPolicy, or >>> > any other ReplicationPolicy implementations with a correctly >>> implemented >>> > "topicSource" method should work. I will make an explicit note of this >>> in >>> > the KIP. >>> > >>> > Thank you, >>> > Daniel >>> > >>> > Viktor Somogyi-Vass <viktor.somo...@cloudera.com.invalid> ezt írta >>> > (időpont: 2024. okt. 18., Pén 17:28): >>> > >>> > > Hey Dan, >>> > > >>> > > I think this is a very useful idea. Two questions: >>> > > >>> > > SVV1: Do you think we need the feature flag at all? I know that not >>> having >>> > > this flag may technically render the KIP unnecessary (however it may >>> still >>> > > be useful to discuss this topic and create a concensus). As you >>> wrote in >>> > > the KIP, we may be able to look up the target and source topics and >>> if we >>> > > can do this, we can probably detect if the replication is one-way or >>> > > prefixless (identity). So that means we don't need this flag to >>> control >>> > > when we want to use this. Then it is really just there to act as >>> something >>> > > that can turn the feature on and off if needed, but I'm not really >>> sure if >>> > > there is a great risk in just enabling this by default. If we really >>> just >>> > > turn back the B -> A checkpoints and save them in the A -> B, then >>> maybe >>> > > it's not too risky and users would get this immediately by just >>> upgrading. >>> > > >>> > > SVV2: You write that we need DefaultReplicationPolicy to use this >>> feature, >>> > > but most of the functionality is there on interface level in >>> > > ReplicationPolicy. Is there anything that is missing from there and >>> if so, >>> > > what do you think about pulling it into the interface? If this >>> improvement >>> > > only works with the default replication policy, then it's somewhat >>> limiting >>> > > as users may have their own policy for various reasons, but if we >>> can make >>> > > it work on the interface level, then we could provide this feature to >>> > > everyone. Of course there can be replication policies like the >>> identity one >>> > > that by design disallows this feature, but for that, see my previous >>> point. >>> > > >>> > > Best, >>> > > Viktor >>> > > >>> > > On Fri, Oct 18, 2024 at 3:30 PM Dániel Urbán <urb.dani...@gmail.com> >>> > > wrote: >>> > > >>> > > > Hi everyone, >>> > > > >>> > > > I'd like to start the discussion on KIP-1098: Reverse >>> Checkpointing ( >>> > > > >>> > > > >>> > > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1098%3A+Reverse+Checkpointing >>> > > > ) >>> > > > which aims to minimize message reprocessing for consumers in >>> failbacks. >>> > > > >>> > > > TIA, >>> > > > Daniel >>> > > > >>> > > >>> >>