Hi Patrik,

I think I owe you an apology. After mulling this over for a few more hours
and going over the code base some more, it seems like your initial approach
was actually better in some ways.

With the regex-based approach, it becomes possible for users to configure
the DefaultReplicationPolicy class in a way that for some topic T,
isInternalTopic(T) returns false, but isCheckpointsTopic(T),
isHeartbeatsTopic(T), or even isMM2InternalTopic(T) return true. It seems
like this kind of inconsistency might cause confusion to users and possibly
even break things down the road depending on how we leverage these methods
in the future.

With the initial approach, using a simple boolean to enable the replication
of what appear to be internal topics, there is some risk of cycles and
accidental replication of genuinely internal topics, but less than I
believed at first. In order for an internal topic to be replicated, not
only would that boolean property have to be explicitly modified by users,
but they would also have to be using a topic filter that allows those
topics to pass through as well.

I think a simple boolean is better here, as it's easier for users to
configure, and less prone to logical inconsistencies. My only suggestions
are:
- Apply the property to the MirrorSourceConnector class instead of the
DefaultReplicationPolicy class, since we're not actually changing the
definition of what is and isn't an internal topic; instead, we're simply
allowing topics to be replicated even if they appear to be internal topics
- Rename it to something less verbose like "replicate.internal.topics"
- Add a warning to the docs for the property explicitly pointing people to
the topic filter class, stating that they should still make sure that
genuinely internal topics (instead of topics that
ReplicationPolicy::isInternalTopic returns true for but which are not
actually internal) are excluded by the filter

Patrik, Greg, how does that sound?

Best,

Chris


On Mon, Aug 5, 2024 at 4:34 AM Patrik Marton <pmar...@cloudera.com.invalid>
wrote:

> Hi Greg and Chris,
>
> Thank you for your feedbacks!
>
> I added the currently existing workarounds to the “Rejected Alternatives”
> section, and explained why I still think we need a separate solution. I
> also added the previous proposal to the rejected alternatives, as I agree
> with your concerns.
> I tried to come up with a bit different regex based solution, and modified
> the KIP based on that. I think it is a more straightforward way to get the
> desired behavior.
>
> Let me know your thoughts.
>
> Thanks,
> Patrik
>
>
>
> On 2024/07/30 18:57:18 Chris Egerton wrote:
> > Hi Patrick,
> >
> > I share Greg's concerns with the feature as it's currently proposed. I
> > don't think I could vote for something unless it made replication of
> > genuinely internal topics and replication cycles impossible, or at least
> > significantly less likely.
> >
> > Best,
> >
> > Chris
> >
> > On Tue, Jul 30, 2024, 14:51 Greg Harris <gr...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Patrik,
> > >
> > > Thanks for the KIP!
> > >
> > > Your motivation for this KIP is reasonable, because it is definitely
> > > possible for the ".internal" suffix to collide with real topics. It
> would
> > > have been nice if the original design included some mm2-specific
> namespace
> > > like "mm2.internal" to lessen the likelihood of a collision.
> > >
> > > However, this is a problem that has numerous existing workarounds:
> > > * Use a custom ReplicationPolicy and override the methods (for existing
> > > workloads/mirror makers)
> > > * Use non-conflicting user topic names (for new user topics)
> > > * Use the replication.policy.separator to use a non-conflicting
> separator
> > > character (for new mirror maker setups)
> > >
> > > And the feature as-described has significant risks attached:
> > > * May allow replication cycles and runaway replication
> > > * Mirrors internal topics that are unusable on the destination cluster
> > >
> > > While these risks can be accounted for if a user is attentive (e.g.
> when
> > > they're writing their own ReplicationPolicy) it is not a risk-free
> > > configuration that composes well with other out-of-the-box
> configurations.
> > > For example, someone may expect to take their existing configuration,
> turn
> > > on this new option, and expect reasonable behavior, which isn't always
> > > guaranteed.
> > >
> > > If you're still interested in this feature, please reference the
> existing
> > > workarounds and include them as rejected alternatives so we can know
> where
> > > the existing solutions fall short.
> > > We'd also have to figure out if and how the risks I mentioned could be
> > > mitigated.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Tue, Jul 30, 2024 at 5:49 AM Patrik Marton
> <pmar...@cloudera.com.invalid
> > > >
> > > wrote:
> > >
> > > > Hi Team,
> > > >
> > > > I would like to start a discussion on KIP-1074: Make the replication
> of
> > > > internal topics configurable
> > > > <
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1074%3A+Make+the+replication+of+internal+topics+configurable
> > > > >
> > > >
> > > > The goal of this KIP is to make it easier to replicate topics that
> seem
> > > > internal (for example ending in .internal suffix), but are just
> normal
> > > > business topics created by the user.
> > > >
> > > > I appreciate any feedback and recommendations!
> > > >
> > > > Thanks!
> > > > Patrik
> > > >
> > >
> >

Reply via email to