Hi Omina, We have not considered this path yet that you proposed and I like it as well and I would also be in favor of a solution that doesn't try to introduce a new config. Since this KIP would only make it into 4.0 anyway and your proposal seems like a smaller compatibility break that can be documented in the upgrade notes and also circumvented by a custom policy, I think we can do this.
I'd be also happy to receive some feedback from those who have been in this conversation to form a consensus around this. Best, Viktor On Mon, Aug 12, 2024 at 2:19 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com> wrote: > Hi Patrick, > > One thing that comes up for me is whether we should update the > TopicFilter > > too to have this more specific pattern, like "mm2-.*.internal". Ideally > I'd > > say yes, but what if a user already has existing .internal user topics > and > > relies on the ReplicationPolicy and TopicFilter not to replicate those, > and > > by changing both to a more specific pattern, we could technically enable > > their replication. I know this is an edge case, so am I being too careful > > here? > > I would say for the ReplicationPolicy blocking all `.internal` and > `-internal` as MM2 / Kafka internals is a bug. So anyone who has been > relaying on this bug to block topics is relaying on a faulty code > especially that none of the Kafka, connect or streams internals do create > any topic with these suffixes except MM2 and all of them can be identified > more accurately without this generic suffix. > > > The other potential cases we might break are > 1. where `replication.policy.internal.topic.separator.enabled` is enabled, > but we might only need to adjust the code to account for this either in > ReplicationPolicy itself > ` default boolean isMM2InternalTopic(String topic) { return > (topic.startsWith("mm2-") && topic.endsWith("internal")) || > isCheckpointsTopic(topic); }` > Or in DefaultReplicationPolicy implementation of `isMM2InternalTopic`. So > we don’t break the backward compatibility for this case > 2. Any use case has a customised ReplicationPolicy with an override of > isMM2InternalTopic. This one might also break the backward compatibility > > As 4.0 coming soon we can leverage this and introduce a breaking change to > MM2 with clarification of the KIP's impact on backward compatibility for > the following cases, > - For anyone has been relaying on this bug to block normal topics with > `internal` suffix they will need to update topic filter config to block > this suffix manually if they wish for. > - For anyone overriding isMM2InternalTopic in their custom > ReplicationPolicy, they might need to update their code account for the new > change to block only mm2 internal topics. > > If this path is acceptable with breaking the compatibility with the next > major Kafka release we should consider fixing the issue and not introducing > another config to go around it especially that this seems more like mistake > from the beginning as MM2 original KIP intention was trying to target MM2 > and Kafka internal topics only and not all topics with this suffix. > > What does everyone think? > > Thanks > Omnia > > On 9 Aug 2024, at 10:40, Patrik Marton <pmar...@cloudera.com.INVALID> > wrote: > > > > Hi All, > > > > Thanks again for your feedback! > > > > Regarding the boolean solution, yes this would technically let the > customer > > accidentally replicate "__.*" topics, but only if they modify the > > TopicFilter to allow their replication, as by default topics matching > this > > pattern are excluded. But in this case we can still hardcode "__.*" > topics > > as internal in the replication policy, if we feel like we need that extra > > layer of safety. > > > > Thanks Omnia for your ideas! > > About #2, I think this is something that we should consider besides the > > boolean solution, as I like the idea that this way we can avoid adding > > another configuration to mm2, and simply define a more specific criteria > > for topics to be considered internal by the replication policy. > > One thing that comes up for me is whether we should update the > TopicFilter > > too to have this more specific pattern, like "mm2-.*.internal". Ideally > I'd > > say yes, but what if a user already has existing .internal user topics > and > > relies on the ReplicationPolicy and TopicFilter not to replicate those, > and > > by changing both to a more specific pattern, we could technically enable > > their replication. I know this is an edge case, so am I being too careful > > here? > > > > Let me know your thoughts, and I will update the KIP too. > > > > Best, > > Patrik > > > > On Thu, Aug 8, 2024 at 2:59 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com> > > wrote: > > > >> Hi > >> Thinking of it more, I feel leaving customers to mirroring Kafka’s > >> internal topics `__.*` either by mistake or intently will be eating up > >> resources from both mm2 and the clusters for no actual value as the > data is > >> not that useful outside the context of the original cluster. > >> These topics usually are unbalanced and clients are the one the control > >> their sizes which will lead to uneven distribution of MM2 resource > usages. > >> And if MM2 missed the tombstone records from the source cluster which is > >> used to cleanup metadata then mirrored records of these topics wouldn’t > get > >> cleaned appropriately. > >> > >> Other options to solve this > >> > >> 1. introduce a way to specify MM2’s internal topics suffix (checkpoints, > >> offset sync, offsets, status, and config topics) and by default this is > >> `.internal`, in case users are allowed to have `.internal` in the topic > >> name then MM2 can be setup with different suffix. > >> `-internal` was set as a way to identify connect internal topics but as > >> far as I can see there is no topic set by MM2 itself with `-internal` so > >> maybe we can drop `-internal`. > >> Then for __.* these aren’t allowed to be mirrored. > >> > >> 2. other option is updating `ReplicationPolicy::isMM2InternalTopic` to > >> check if topic is starting with `mm2` and end with `.internal` as this > is > >> the pattern for default topics so far. > >>> default boolean isMM2InternalTopic(String topic) { > >>> // MM2 has a pattern for most topics that shouldn't be mirriord > >> which is mm2-*.*.internal: > >>> // "mm2-offset-syncs." + clusterAlias + ".internal" > >>> // "mm2-configs." + sourceAndTarget.source() + ".internal" > >>> // "mm2-status." + sourceAndTarget.source() + ".internal" > >>> // "mm2-offsets." + sourceAndTarget.source() + ".internal" > >>> // The only exception is checkpoints which is clusterAlias + > >> ".checkpoints.internal" > >>> return (topic.startsWith("mm2-") && topic.endsWith(".internal")) || > >> isCheckpointsTopic(topic); > >>> } > >> > >> As far as I can see this will cover offset-sync topic, all default > connect > >> topic names used by mirror and checkpoints which what the method > originally > >> is trying to target. This way we will have some sense of “fake > namespace” > >> without introducing one. > >> > >> Then `ReplicationPolicy::isInternalTopic` can be > >>> default boolean isInternalTopic(String topic) { > >>> boolean isKafkaInternalTopic = topic.startsWith("__") || > >> topic.startsWith("."); > >>> return isMM2InternalTopic(topic) || isKafkaInternalTopic; > >>> } > >> > >> `-internal` will be dropped as I can’t see any topic name with this > suffix > >> setup by MM2. > >> > >> Best, > >> Omnia > >> > >>> On 7 Aug 2024, at 17:26, Omnia Ibrahim <o.g.h.ibra...@gmail.com> > wrote: > >>> > >>> Hi All, > >>> > >>> I have a couple of notes to add > >>> > >>> OI-1.Regarding the proposed default regex in the KIP, > >> "__.*|\\..*|.*-internal|.*\\.internal", when users wish to change this, > >> wouldn’t they mostly need to keep the __.* part? Especially since __ is > >> primarily reserved for broker internal metadata topics. > >>> > >>> OI-2.Whether we choose a boolean or regex, I think we need to identify > >> if this will include replicating the internal metadata topics. The > metadata > >> within these topics are not that useful outside of their original > cluster, > >> so replicating them doesn’t add much value for anyone. > >>> The other problem with mistakenly replicating these topics is that they > >> are set up with a compacted policy, which means they don’t get truncated > >> naturally by retention. These metadata topics are cleaned in a special > way > >> to stay within the metadata retention milliseconds or expiration > configs. > >> As a result of this special situation, replicating these by mistake > might > >> put the destination cluster’s capacity at risk. > >>> Maybe we should be clear that any topic in Topic::INTERNAL_TOPICS is > not > >> going to be replicated? WDYT? > >>> > >>> Thanks > >>> Omnia > >>> > >>>> On 7 Aug 2024, at 15:46, Viktor Somogyi-Vass < > >> viktor.somo...@cloudera.com.INVALID> wrote: > >>>> > >>>> Hi all, > >>>> > >>>> 1. Yea, thinking of it more, I'm also concerned about the upgrade path > >> and > >>>> I think that it may not be worth adding a ton of code to work around > >> that > >>>> one time upgrade. It likely needs extra configs as well or > >> documentation to > >>>> instruct users, so it may not be worth the price. So let's add this to > >> the > >>>> rejected alternatives. > >>>> > >>>> 2. As Chris mentioned, I also feel that users may want to improve it > >> with > >>>> regexes and after a certain number of topics it may not be easily > >>>> manageable. And indeed, it feels a little bit like duplicating the > >>>> functionality of the topic filter. Overall I like the boolean config > >> best > >>>> as that prevents most of the unwanted replication cycles by default > with > >>>> the TopicFilter. > >>>> > >>>> Best, > >>>> Viktor > >>>> > >>>> On Wed, Aug 7, 2024 at 10:59 AM Patrik Marton > >> <pmar...@cloudera.com.invalid> > >>>> wrote: > >>>> > >>>>> Hi All, > >>>>> > >>>>> Thanks again for your feedback! > >>>>> > >>>>> My thoughts on Viktor's ideas: > >>>>> > >>>>> 1. This would definitely be a clean solution in the end, but I also > >> feel > >>>>> that the complexity of this solution might outweigh the benefits. If > >> you > >>>>> guys think, we can give it a thought and figure out what exactly > needs > >> to > >>>>> be changed and how the upgrade path would look like, but my opinion > is > >> that > >>>>> since it is a rare use-case, a simple solution might be better. > >>>>> > >>>>> 2. I initially wanted to propose something like this, but my concern > >> was > >>>>> that by adding another way a topic can be whitelisted, we would also > >>>>> increase the complexity and it would require more attention from > >> users. But > >>>>> I also think that this would probably be the safest solution of all, > >> and > >>>>> less risky than modifying regexes or using that simple boolean. > >>>>> Since the TopicFilter is not really the problem as its exclude list > >> can be > >>>>> modified, and it basically acts as another layer of safety, we could > >>>>> potentially add this new config (a list of topics) to the > >>>>> DefaultReplicationPolicy, and any topic listed in the new config > would > >> not > >>>>> be considered internal by the replication policy. This way we would > not > >>>>> duplicate the work of the Topic Filter, simply provide a way to tell > >> the > >>>>> DefaultReplicationPolicy that topics in this list seem internal, but > >> they > >>>>> are not. > >>>>> > >>>>> What do you think? > >>>>> > >>>>> Best, > >>>>> Patrik > >>>>> > >>>>> > >>>>> On Tue, Aug 6, 2024 at 7:17 PM Chris Egerton <chr...@aiven.io.invalid > > > >>>>> wrote: > >>>>> > >>>>>> Hi Viktor, > >>>>>> > >>>>>> Nice to hear from you! My thoughts: > >>>>>> > >>>>>> 1. I initially liked this idea, but it doesn't seem like the costs > to > >> all > >>>>>> existing users outweigh the benefits that would be reaped by just a > >> few. > >>>>>> This is possibly because I'm assuming the upgrade path would be > pretty > >>>>>> gnarly, though; if we can make it really, really smooth, especially > >> for > >>>>>> existing users who don't need to be able to replicate topics ending > in > >>>>>> "-internal", then I think it'd be a viable option. > >>>>>> > >>>>>> 2. This is definitely safer than a simple boolean property, but > >> doesn't > >>>>> it > >>>>>> feel like we're duplicating the work of the topic filter at this > >> point, > >>>>>> except with something a little more cumbersome for users? People > >> already > >>>>>> have a way of include- and exclude-listing topics by regex with that > >>>>>> mechanism, and I'm worried we're not placing enough trust in the > >>>>>> flexibility and safety it already provides us and are adding layers > of > >>>>>> safety that are ultimately redundant. I might also be pessimistic > here > >>>>> but > >>>>>> I can't shake the feeling that we'd eventually get a KIP where > someone > >>>>>> states that it's too hard to add topics one-by-one and would like to > >>>>>> include them by regex, prefix match, or some other broader > mechanism. > >>>>>> > >>>>>> Do you find this convincing? I'm also curious what Greg and Patrik > >> think. > >>>>>> > >>>>>> Cheers, > >>>>>> > >>>>>> Chris > >>>>>> > >>>>>> On Tue, Aug 6, 2024 at 5:08 AM Viktor Somogyi-Vass > >>>>>> <viktor.somo...@cloudera.com.invalid> wrote: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I think there are a couple of other alternatives to discuss. > >>>>>>> > >>>>>>> 1. We may just simply change the internal topic naming to something > >>>>> like > >>>>>>> "mm2internal" and provide upgrade functionality as well. In this > >>>>> scenario > >>>>>>> we wouldn't need to tinker with the prefixes or the naming > patterns, > >> we > >>>>>>> just simply free up the "internal" keyword and after upgrade allow > >> all > >>>>>>> topics with internal to be mirrored. The upgrade however could be > >> very > >>>>>>> complicated as we'd need to provide a way for users to move away > from > >>>>> the > >>>>>>> old topics (what happens in case of EOS, how do we make sure to > >>>>>> discontinue > >>>>>>> the old topics, what happens if the upgrade gets interrupted etc.). > >>>>> But I > >>>>>>> think this is worth considering and for the upgrade, we may take a > >> few > >>>>>>> ideas from the zk->kraft migration to make it easier. > >>>>>>> > >>>>>>> 2. A new config to force the replication of certain topics. In > this, > >>>>>> users > >>>>>>> would have to specify topics 1 by 1 and this would bypass the > filters > >>>>> and > >>>>>>> the replication policy. Since forcing is usually not the normal way > >> of > >>>>>>> doing things in general, users may be more willing to accept > >> unintended > >>>>>>> side effects. > >>>>>>> > >>>>>>> What do you all think? > >>>>>>> (If they aren't better than the original ideas, I'm absolutely good > >>>>> with > >>>>>>> putting these to the rejected alternatives but wanted to write it > all > >>>>>> out.) > >>>>>>> > >>>>>>> Best, > >>>>>>> Viktor > >>>>>>> > >>>>>>> On Mon, Aug 5, 2024 at 8:18 PM Chris Egerton > <chr...@aiven.io.invalid > >>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi Patrik, > >>>>>>>> > >>>>>>>> I think I owe you an apology. After mulling this over for a few > more > >>>>>>> hours > >>>>>>>> and going over the code base some more, it seems like your initial > >>>>>>> approach > >>>>>>>> was actually better in some ways. > >>>>>>>> > >>>>>>>> With the regex-based approach, it becomes possible for users to > >>>>>> configure > >>>>>>>> the DefaultReplicationPolicy class in a way that for some topic T, > >>>>>>>> isInternalTopic(T) returns false, but isCheckpointsTopic(T), > >>>>>>>> isHeartbeatsTopic(T), or even isMM2InternalTopic(T) return true. > It > >>>>>> seems > >>>>>>>> like this kind of inconsistency might cause confusion to users and > >>>>>>> possibly > >>>>>>>> even break things down the road depending on how we leverage these > >>>>>>> methods > >>>>>>>> in the future. > >>>>>>>> > >>>>>>>> With the initial approach, using a simple boolean to enable the > >>>>>>> replication > >>>>>>>> of what appear to be internal topics, there is some risk of cycles > >>>>> and > >>>>>>>> accidental replication of genuinely internal topics, but less > than I > >>>>>>>> believed at first. In order for an internal topic to be > replicated, > >>>>> not > >>>>>>>> only would that boolean property have to be explicitly modified by > >>>>>> users, > >>>>>>>> but they would also have to be using a topic filter that allows > >> those > >>>>>>>> topics to pass through as well. > >>>>>>>> > >>>>>>>> I think a simple boolean is better here, as it's easier for users > to > >>>>>>>> configure, and less prone to logical inconsistencies. My only > >>>>>> suggestions > >>>>>>>> are: > >>>>>>>> - Apply the property to the MirrorSourceConnector class instead of > >>>>> the > >>>>>>>> DefaultReplicationPolicy class, since we're not actually changing > >> the > >>>>>>>> definition of what is and isn't an internal topic; instead, we're > >>>>>> simply > >>>>>>>> allowing topics to be replicated even if they appear to be > internal > >>>>>>> topics > >>>>>>>> - Rename it to something less verbose like > >>>>> "replicate.internal.topics" > >>>>>>>> - Add a warning to the docs for the property explicitly pointing > >>>>> people > >>>>>>> to > >>>>>>>> the topic filter class, stating that they should still make sure > >> that > >>>>>>>> genuinely internal topics (instead of topics that > >>>>>>>> ReplicationPolicy::isInternalTopic returns true for but which are > >> not > >>>>>>>> actually internal) are excluded by the filter > >>>>>>>> > >>>>>>>> Patrik, Greg, how does that sound? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> > >>>>>>>> Chris > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Aug 5, 2024 at 4:34 AM Patrik Marton > >>>>>>> <pmar...@cloudera.com.invalid > >>>>>>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi Greg and Chris, > >>>>>>>>> > >>>>>>>>> Thank you for your feedbacks! > >>>>>>>>> > >>>>>>>>> I added the currently existing workarounds to the “Rejected > >>>>>>> Alternatives” > >>>>>>>>> section, and explained why I still think we need a separate > >>>>>> solution. I > >>>>>>>>> also added the previous proposal to the rejected alternatives, > as I > >>>>>>> agree > >>>>>>>>> with your concerns. > >>>>>>>>> I tried to come up with a bit different regex based solution, and > >>>>>>>> modified > >>>>>>>>> the KIP based on that. I think it is a more straightforward way > to > >>>>>> get > >>>>>>>> the > >>>>>>>>> desired behavior. > >>>>>>>>> > >>>>>>>>> Let me know your thoughts. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Patrik > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2024/07/30 18:57:18 Chris Egerton wrote: > >>>>>>>>>> Hi Patrick, > >>>>>>>>>> > >>>>>>>>>> I share Greg's concerns with the feature as it's currently > >>>>>> proposed. > >>>>>>> I > >>>>>>>>>> don't think I could vote for something unless it made > replication > >>>>>> of > >>>>>>>>>> genuinely internal topics and replication cycles impossible, or > >>>>> at > >>>>>>>> least > >>>>>>>>>> significantly less likely. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> > >>>>>>>>>> Chris > >>>>>>>>>> > >>>>>>>>>> On Tue, Jul 30, 2024, 14:51 Greg Harris <gr...@aiven.io.invalid > > > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi Patrik, > >>>>>>>>>>> > >>>>>>>>>>> Thanks for the KIP! > >>>>>>>>>>> > >>>>>>>>>>> Your motivation for this KIP is reasonable, because it is > >>>>>>> definitely > >>>>>>>>>>> possible for the ".internal" suffix to collide with real > >>>>> topics. > >>>>>> It > >>>>>>>>> would > >>>>>>>>>>> have been nice if the original design included some > >>>>> mm2-specific > >>>>>>>>> namespace > >>>>>>>>>>> like "mm2.internal" to lessen the likelihood of a collision. > >>>>>>>>>>> > >>>>>>>>>>> However, this is a problem that has numerous existing > >>>>>> workarounds: > >>>>>>>>>>> * Use a custom ReplicationPolicy and override the methods (for > >>>>>>>> existing > >>>>>>>>>>> workloads/mirror makers) > >>>>>>>>>>> * Use non-conflicting user topic names (for new user topics) > >>>>>>>>>>> * Use the replication.policy.separator to use a non-conflicting > >>>>>>>>> separator > >>>>>>>>>>> character (for new mirror maker setups) > >>>>>>>>>>> > >>>>>>>>>>> And the feature as-described has significant risks attached: > >>>>>>>>>>> * May allow replication cycles and runaway replication > >>>>>>>>>>> * Mirrors internal topics that are unusable on the destination > >>>>>>>> cluster > >>>>>>>>>>> > >>>>>>>>>>> While these risks can be accounted for if a user is attentive > >>>>>> (e.g. > >>>>>>>>> when > >>>>>>>>>>> they're writing their own ReplicationPolicy) it is not a > >>>>>> risk-free > >>>>>>>>>>> configuration that composes well with other out-of-the-box > >>>>>>>>> configurations. > >>>>>>>>>>> For example, someone may expect to take their existing > >>>>>>> configuration, > >>>>>>>>> turn > >>>>>>>>>>> on this new option, and expect reasonable behavior, which isn't > >>>>>>>> always > >>>>>>>>>>> guaranteed. > >>>>>>>>>>> > >>>>>>>>>>> If you're still interested in this feature, please reference > >>>>> the > >>>>>>>>> existing > >>>>>>>>>>> workarounds and include them as rejected alternatives so we can > >>>>>>> know > >>>>>>>>> where > >>>>>>>>>>> the existing solutions fall short. > >>>>>>>>>>> We'd also have to figure out if and how the risks I mentioned > >>>>>> could > >>>>>>>> be > >>>>>>>>>>> mitigated. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Greg > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Jul 30, 2024 at 5:49 AM Patrik Marton > >>>>>>>>> <pmar...@cloudera.com.invalid > >>>>>>>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi Team, > >>>>>>>>>>>> > >>>>>>>>>>>> I would like to start a discussion on KIP-1074: Make the > >>>>>>>> replication > >>>>>>>>> of > >>>>>>>>>>>> internal topics configurable > >>>>>>>>>>>> < > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1074%3A+Make+the+replication+of+internal+topics+configurable > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> The goal of this KIP is to make it easier to replicate topics > >>>>>>> that > >>>>>>>>> seem > >>>>>>>>>>>> internal (for example ending in .internal suffix), but are > >>>>> just > >>>>>>>>> normal > >>>>>>>>>>>> business topics created by the user. > >>>>>>>>>>>> > >>>>>>>>>>>> I appreciate any feedback and recommendations! > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks! > >>>>>>>>>>>> Patrik > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >> > >> > >