Hi Patrick, 
> One thing that comes up for me is whether we should update the TopicFilter
> too to have this more specific pattern, like "mm2-.*.internal". Ideally I'd
> say yes, but what if a user already has existing .internal user topics and
> relies on the ReplicationPolicy and TopicFilter not to replicate those, and
> by changing both to a more specific pattern, we could technically enable
> their replication. I know this is an edge case, so am I being too careful
> here?

I would say for the ReplicationPolicy blocking all `.internal` and `-internal` 
as MM2 / Kafka internals is a bug. So anyone who has been relaying on this bug 
to block topics is relaying on a faulty code especially that none of the Kafka, 
connect or streams internals do create any topic with these suffixes except MM2 
and all of them can be identified more accurately without this generic suffix. 


The other potential cases we might break are
1. where `replication.policy.internal.topic.separator.enabled` is enabled, but 
we might only need to adjust the code to account for this either in 
ReplicationPolicy itself
` default boolean isMM2InternalTopic(String topic) { return 
(topic.startsWith("mm2-") && topic.endsWith("internal")) || 
isCheckpointsTopic(topic); }`
Or in DefaultReplicationPolicy implementation of `isMM2InternalTopic`.  So we 
don’t break the backward compatibility for this case 
2. Any use case has a customised ReplicationPolicy with an override of 
isMM2InternalTopic. This one might also break the backward compatibility 

As 4.0 coming soon we can leverage this and introduce a breaking change to MM2 
with clarification of the KIP's impact on backward compatibility for the 
following cases, 
- For anyone has been relaying on this bug to block normal topics with 
`internal` suffix they will need to update topic filter config to block this 
suffix manually if they wish for. 
- For anyone overriding  isMM2InternalTopic in their custom ReplicationPolicy, 
they might need to update their code account for the new change to block only 
mm2 internal topics. 

If this path is acceptable with breaking the compatibility with the next major 
Kafka release we should consider fixing the issue and not introducing another 
config to go around it especially that this seems more like mistake from the 
beginning as MM2 original KIP intention was trying to target MM2 and Kafka 
internal topics only and not all topics with this suffix. 

What does everyone think?

Thanks 
Omnia
> On 9 Aug 2024, at 10:40, Patrik Marton <pmar...@cloudera.com.INVALID> wrote:
> 
> Hi All,
> 
> Thanks again for your feedback!
> 
> Regarding the boolean solution, yes this would technically let the customer
> accidentally replicate "__.*" topics, but only if they modify the
> TopicFilter to allow their replication, as by default topics matching this
> pattern are excluded. But in this case we can still hardcode "__.*" topics
> as internal in the replication policy, if we feel like we need that extra
> layer of safety.
> 
> Thanks Omnia for your ideas!
> About #2, I think this is something that we should consider besides the
> boolean solution, as I like the idea that this way we can avoid adding
> another configuration to mm2, and simply define a more specific criteria
> for topics to be considered internal by the replication policy.
> One thing that comes up for me is whether we should update the TopicFilter
> too to have this more specific pattern, like "mm2-.*.internal". Ideally I'd
> say yes, but what if a user already has existing .internal user topics and
> relies on the ReplicationPolicy and TopicFilter not to replicate those, and
> by changing both to a more specific pattern, we could technically enable
> their replication. I know this is an edge case, so am I being too careful
> here?
> 
> Let me know your thoughts, and I will update the KIP too.
> 
> Best,
> Patrik
> 
> On Thu, Aug 8, 2024 at 2:59 PM Omnia Ibrahim <o.g.h.ibra...@gmail.com>
> wrote:
> 
>> Hi
>> Thinking of it more, I feel leaving customers to mirroring Kafka’s
>> internal topics `__.*` either by mistake or intently will be eating up
>> resources from both mm2 and the clusters for no actual value as the data is
>> not that useful outside the context of the original cluster.
>> These topics usually are unbalanced and clients are the one the control
>> their sizes which will lead to uneven distribution of MM2 resource usages.
>> And if MM2 missed the tombstone records from the source cluster which is
>> used to cleanup metadata then mirrored records of these topics wouldn’t get
>> cleaned appropriately.
>> 
>> Other options to solve this
>> 
>> 1. introduce a way to specify MM2’s internal topics suffix (checkpoints,
>> offset sync, offsets, status, and config topics) and by default this is
>> `.internal`, in case users are allowed to have `.internal` in the topic
>> name then MM2 can be setup with different suffix.
>> `-internal` was set as a way to identify connect internal topics but as
>> far as I can see there is no topic set by MM2 itself with `-internal` so
>> maybe we can drop `-internal`.
>> Then  for  __.* these aren’t allowed to be mirrored.
>> 
>> 2. other option is updating `ReplicationPolicy::isMM2InternalTopic` to
>> check if topic is starting with `mm2` and end with `.internal` as this is
>> the pattern for default topics so far.
>>> default boolean isMM2InternalTopic(String topic) {
>>>    // MM2 has a pattern for most topics that shouldn't be mirriord
>> which is mm2-*.*.internal:
>>>    // "mm2-offset-syncs." + clusterAlias + ".internal"
>>>    // "mm2-configs." + sourceAndTarget.source() + ".internal"
>>>    // "mm2-status." + sourceAndTarget.source() + ".internal"
>>>    // "mm2-offsets." + sourceAndTarget.source() + ".internal"
>>>    // The only exception is checkpoints which is clusterAlias +
>> ".checkpoints.internal"
>>>    return (topic.startsWith("mm2-") && topic.endsWith(".internal")) ||
>> isCheckpointsTopic(topic);
>>> }
>> 
>> As far as I can see this will cover offset-sync topic, all default connect
>> topic names used by mirror and checkpoints which what the method originally
>> is trying to target. This way we will have some sense of “fake namespace”
>> without introducing one.
>> 
>> Then `ReplicationPolicy::isInternalTopic` can be
>>> default boolean isInternalTopic(String topic) {
>>>    boolean isKafkaInternalTopic = topic.startsWith("__") ||
>> topic.startsWith(".");
>>>    return isMM2InternalTopic(topic) || isKafkaInternalTopic;
>>> }
>> 
>> `-internal` will be dropped as I can’t see any topic name with this suffix
>> setup by MM2.
>> 
>> Best,
>> Omnia
>> 
>>> On 7 Aug 2024, at 17:26, Omnia Ibrahim <o.g.h.ibra...@gmail.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> I have a couple of notes to add
>>> 
>>> OI-1.Regarding the proposed default regex in the KIP,
>> "__.*|\\..*|.*-internal|.*\\.internal", when users wish to change this,
>> wouldn’t they mostly need to keep the __.* part? Especially since __ is
>> primarily reserved for broker internal metadata topics.
>>> 
>>> OI-2.Whether we choose a boolean or regex, I think we need to identify
>> if this will include replicating the internal metadata topics. The metadata
>> within these topics are not that useful outside of their original cluster,
>> so replicating them doesn’t add much value for anyone.
>>> The other problem with mistakenly replicating these topics is that they
>> are set up with a compacted policy, which means they don’t get truncated
>> naturally by retention. These metadata topics are cleaned in a special way
>> to stay within the metadata retention milliseconds or expiration configs.
>> As a result of this special situation, replicating these by mistake might
>> put the destination cluster’s capacity at risk.
>>> Maybe we should be clear that any topic in Topic::INTERNAL_TOPICS is not
>> going to be replicated? WDYT?
>>> 
>>> Thanks
>>> Omnia
>>> 
>>>> On 7 Aug 2024, at 15:46, Viktor Somogyi-Vass <
>> viktor.somo...@cloudera.com.INVALID> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> 1. Yea, thinking of it more, I'm also concerned about the upgrade path
>> and
>>>> I think that it may not be worth adding a ton of code to work around
>> that
>>>> one time upgrade. It likely needs extra configs as well or
>> documentation to
>>>> instruct users, so it may not be worth the price. So let's add this to
>> the
>>>> rejected alternatives.
>>>> 
>>>> 2. As Chris mentioned, I also feel that users may want to improve it
>> with
>>>> regexes and after a certain number of topics it may not be easily
>>>> manageable. And indeed, it feels a little bit like duplicating the
>>>> functionality of the topic filter. Overall I like the boolean config
>> best
>>>> as that prevents most of the unwanted replication cycles by default with
>>>> the TopicFilter.
>>>> 
>>>> Best,
>>>> Viktor
>>>> 
>>>> On Wed, Aug 7, 2024 at 10:59 AM Patrik Marton
>> <pmar...@cloudera.com.invalid>
>>>> wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Thanks again for your feedback!
>>>>> 
>>>>> My thoughts on Viktor's ideas:
>>>>> 
>>>>> 1. This would definitely be a clean solution in the end, but I also
>> feel
>>>>> that the complexity of this solution might outweigh the benefits. If
>> you
>>>>> guys think, we can give it a thought and figure out what exactly needs
>> to
>>>>> be changed and how the upgrade path would look like, but my opinion is
>> that
>>>>> since it is a rare use-case, a simple solution might be better.
>>>>> 
>>>>> 2. I initially wanted to propose something like this, but my concern
>> was
>>>>> that by adding another way a topic can be whitelisted, we would also
>>>>> increase the complexity and it would require more attention from
>> users. But
>>>>> I also think that this would probably be the safest solution of all,
>> and
>>>>> less risky than modifying regexes or using that simple boolean.
>>>>> Since the TopicFilter is not really the problem as its exclude list
>> can be
>>>>> modified, and it basically acts as another layer of safety, we could
>>>>> potentially add this new config (a list of topics) to the
>>>>> DefaultReplicationPolicy, and any topic listed in the new config would
>> not
>>>>> be considered internal by the replication policy. This way we would not
>>>>> duplicate the work of the Topic Filter, simply provide a way to tell
>> the
>>>>> DefaultReplicationPolicy that topics in this list seem internal, but
>> they
>>>>> are not.
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> Best,
>>>>> Patrik
>>>>> 
>>>>> 
>>>>> On Tue, Aug 6, 2024 at 7:17 PM Chris Egerton <chr...@aiven.io.invalid>
>>>>> wrote:
>>>>> 
>>>>>> Hi Viktor,
>>>>>> 
>>>>>> Nice to hear from you! My thoughts:
>>>>>> 
>>>>>> 1. I initially liked this idea, but it doesn't seem like the costs to
>> all
>>>>>> existing users outweigh the benefits that would be reaped by just a
>> few.
>>>>>> This is possibly because I'm assuming the upgrade path would be pretty
>>>>>> gnarly, though; if we can make it really, really smooth, especially
>> for
>>>>>> existing users who don't need to be able to replicate topics ending in
>>>>>> "-internal", then I think it'd be a viable option.
>>>>>> 
>>>>>> 2. This is definitely safer than a simple boolean property, but
>> doesn't
>>>>> it
>>>>>> feel like we're duplicating the work of the topic filter at this
>> point,
>>>>>> except with something a little more cumbersome for users? People
>> already
>>>>>> have a way of include- and exclude-listing topics by regex with that
>>>>>> mechanism, and I'm worried we're not placing enough trust in the
>>>>>> flexibility and safety it already provides us and are adding layers of
>>>>>> safety that are ultimately redundant. I might also be pessimistic here
>>>>> but
>>>>>> I can't shake the feeling that we'd eventually get a KIP where someone
>>>>>> states that it's too hard to add topics one-by-one and would like to
>>>>>> include them by regex, prefix match, or some other broader mechanism.
>>>>>> 
>>>>>> Do you find this convincing? I'm also curious what Greg and Patrik
>> think.
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Chris
>>>>>> 
>>>>>> On Tue, Aug 6, 2024 at 5:08 AM Viktor Somogyi-Vass
>>>>>> <viktor.somo...@cloudera.com.invalid> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I think there are a couple of other alternatives to discuss.
>>>>>>> 
>>>>>>> 1. We may just simply change the internal topic naming to something
>>>>> like
>>>>>>> "mm2internal" and provide upgrade functionality as well. In this
>>>>> scenario
>>>>>>> we wouldn't need to tinker with the prefixes or the naming patterns,
>> we
>>>>>>> just simply free up the "internal" keyword and after upgrade allow
>> all
>>>>>>> topics with internal to be mirrored. The upgrade however could be
>> very
>>>>>>> complicated as we'd need to provide a way for users to move away from
>>>>> the
>>>>>>> old topics (what happens in case of EOS, how do we make sure to
>>>>>> discontinue
>>>>>>> the old topics, what happens if the upgrade gets interrupted etc.).
>>>>> But I
>>>>>>> think this is worth considering and for the upgrade, we may take a
>> few
>>>>>>> ideas from the zk->kraft migration to make it easier.
>>>>>>> 
>>>>>>> 2. A new config to force the replication of certain topics. In this,
>>>>>> users
>>>>>>> would have to specify topics 1 by 1 and this would bypass the filters
>>>>> and
>>>>>>> the replication policy. Since forcing is usually not the normal way
>> of
>>>>>>> doing things in general, users may be more willing to accept
>> unintended
>>>>>>> side effects.
>>>>>>> 
>>>>>>> What do you all think?
>>>>>>> (If they aren't better than the original ideas, I'm absolutely good
>>>>> with
>>>>>>> putting these to the rejected alternatives but wanted to write it all
>>>>>> out.)
>>>>>>> 
>>>>>>> Best,
>>>>>>> Viktor
>>>>>>> 
>>>>>>> On Mon, Aug 5, 2024 at 8:18 PM Chris Egerton <chr...@aiven.io.invalid
>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Patrik,
>>>>>>>> 
>>>>>>>> I think I owe you an apology. After mulling this over for a few more
>>>>>>> hours
>>>>>>>> and going over the code base some more, it seems like your initial
>>>>>>> approach
>>>>>>>> was actually better in some ways.
>>>>>>>> 
>>>>>>>> With the regex-based approach, it becomes possible for users to
>>>>>> configure
>>>>>>>> the DefaultReplicationPolicy class in a way that for some topic T,
>>>>>>>> isInternalTopic(T) returns false, but isCheckpointsTopic(T),
>>>>>>>> isHeartbeatsTopic(T), or even isMM2InternalTopic(T) return true. It
>>>>>> seems
>>>>>>>> like this kind of inconsistency might cause confusion to users and
>>>>>>> possibly
>>>>>>>> even break things down the road depending on how we leverage these
>>>>>>> methods
>>>>>>>> in the future.
>>>>>>>> 
>>>>>>>> With the initial approach, using a simple boolean to enable the
>>>>>>> replication
>>>>>>>> of what appear to be internal topics, there is some risk of cycles
>>>>> and
>>>>>>>> accidental replication of genuinely internal topics, but less than I
>>>>>>>> believed at first. In order for an internal topic to be replicated,
>>>>> not
>>>>>>>> only would that boolean property have to be explicitly modified by
>>>>>> users,
>>>>>>>> but they would also have to be using a topic filter that allows
>> those
>>>>>>>> topics to pass through as well.
>>>>>>>> 
>>>>>>>> I think a simple boolean is better here, as it's easier for users to
>>>>>>>> configure, and less prone to logical inconsistencies. My only
>>>>>> suggestions
>>>>>>>> are:
>>>>>>>> - Apply the property to the MirrorSourceConnector class instead of
>>>>> the
>>>>>>>> DefaultReplicationPolicy class, since we're not actually changing
>> the
>>>>>>>> definition of what is and isn't an internal topic; instead, we're
>>>>>> simply
>>>>>>>> allowing topics to be replicated even if they appear to be internal
>>>>>>> topics
>>>>>>>> - Rename it to something less verbose like
>>>>> "replicate.internal.topics"
>>>>>>>> - Add a warning to the docs for the property explicitly pointing
>>>>> people
>>>>>>> to
>>>>>>>> the topic filter class, stating that they should still make sure
>> that
>>>>>>>> genuinely internal topics (instead of topics that
>>>>>>>> ReplicationPolicy::isInternalTopic returns true for but which are
>> not
>>>>>>>> actually internal) are excluded by the filter
>>>>>>>> 
>>>>>>>> Patrik, Greg, how does that sound?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> Chris
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Aug 5, 2024 at 4:34 AM Patrik Marton
>>>>>>> <pmar...@cloudera.com.invalid
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Greg and Chris,
>>>>>>>>> 
>>>>>>>>> Thank you for your feedbacks!
>>>>>>>>> 
>>>>>>>>> I added the currently existing workarounds to the “Rejected
>>>>>>> Alternatives”
>>>>>>>>> section, and explained why I still think we need a separate
>>>>>> solution. I
>>>>>>>>> also added the previous proposal to the rejected alternatives, as I
>>>>>>> agree
>>>>>>>>> with your concerns.
>>>>>>>>> I tried to come up with a bit different regex based solution, and
>>>>>>>> modified
>>>>>>>>> the KIP based on that. I think it is a more straightforward way to
>>>>>> get
>>>>>>>> the
>>>>>>>>> desired behavior.
>>>>>>>>> 
>>>>>>>>> Let me know your thoughts.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Patrik
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 2024/07/30 18:57:18 Chris Egerton wrote:
>>>>>>>>>> Hi Patrick,
>>>>>>>>>> 
>>>>>>>>>> I share Greg's concerns with the feature as it's currently
>>>>>> proposed.
>>>>>>> I
>>>>>>>>>> don't think I could vote for something unless it made replication
>>>>>> of
>>>>>>>>>> genuinely internal topics and replication cycles impossible, or
>>>>> at
>>>>>>>> least
>>>>>>>>>> significantly less likely.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> 
>>>>>>>>>> Chris
>>>>>>>>>> 
>>>>>>>>>> On Tue, Jul 30, 2024, 14:51 Greg Harris <gr...@aiven.io.invalid>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Patrik,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the KIP!
>>>>>>>>>>> 
>>>>>>>>>>> Your motivation for this KIP is reasonable, because it is
>>>>>>> definitely
>>>>>>>>>>> possible for the ".internal" suffix to collide with real
>>>>> topics.
>>>>>> It
>>>>>>>>> would
>>>>>>>>>>> have been nice if the original design included some
>>>>> mm2-specific
>>>>>>>>> namespace
>>>>>>>>>>> like "mm2.internal" to lessen the likelihood of a collision.
>>>>>>>>>>> 
>>>>>>>>>>> However, this is a problem that has numerous existing
>>>>>> workarounds:
>>>>>>>>>>> * Use a custom ReplicationPolicy and override the methods (for
>>>>>>>> existing
>>>>>>>>>>> workloads/mirror makers)
>>>>>>>>>>> * Use non-conflicting user topic names (for new user topics)
>>>>>>>>>>> * Use the replication.policy.separator to use a non-conflicting
>>>>>>>>> separator
>>>>>>>>>>> character (for new mirror maker setups)
>>>>>>>>>>> 
>>>>>>>>>>> And the feature as-described has significant risks attached:
>>>>>>>>>>> * May allow replication cycles and runaway replication
>>>>>>>>>>> * Mirrors internal topics that are unusable on the destination
>>>>>>>> cluster
>>>>>>>>>>> 
>>>>>>>>>>> While these risks can be accounted for if a user is attentive
>>>>>> (e.g.
>>>>>>>>> when
>>>>>>>>>>> they're writing their own ReplicationPolicy) it is not a
>>>>>> risk-free
>>>>>>>>>>> configuration that composes well with other out-of-the-box
>>>>>>>>> configurations.
>>>>>>>>>>> For example, someone may expect to take their existing
>>>>>>> configuration,
>>>>>>>>> turn
>>>>>>>>>>> on this new option, and expect reasonable behavior, which isn't
>>>>>>>> always
>>>>>>>>>>> guaranteed.
>>>>>>>>>>> 
>>>>>>>>>>> If you're still interested in this feature, please reference
>>>>> the
>>>>>>>>> existing
>>>>>>>>>>> workarounds and include them as rejected alternatives so we can
>>>>>>> know
>>>>>>>>> where
>>>>>>>>>>> the existing solutions fall short.
>>>>>>>>>>> We'd also have to figure out if and how the risks I mentioned
>>>>>> could
>>>>>>>> be
>>>>>>>>>>> mitigated.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Greg
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Jul 30, 2024 at 5:49 AM Patrik Marton
>>>>>>>>> <pmar...@cloudera.com.invalid
>>>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Team,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to start a discussion on KIP-1074: Make the
>>>>>>>> replication
>>>>>>>>> of
>>>>>>>>>>>> internal topics configurable
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1074%3A+Make+the+replication+of+internal+topics+configurable
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> The goal of this KIP is to make it easier to replicate topics
>>>>>>> that
>>>>>>>>> seem
>>>>>>>>>>>> internal (for example ending in .internal suffix), but are
>>>>> just
>>>>>>>>> normal
>>>>>>>>>>>> business topics created by the user.
>>>>>>>>>>>> 
>>>>>>>>>>>> I appreciate any feedback and recommendations!
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Patrik
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>> 

Reply via email to