Hi all,

With MirrorMaker 2.0 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0)
accepted and coming along very nicely in development, it has got me
wondering if a certain use case is supported, and if not, can changes be
made to Streams or MM2 to support it.  I'll explain the use case, but the
TL;DR here is "do we need more control over topic naming in MM2 or Streams?"

My team foresees using MM2 as a way to mirror data from our prod
environment to a pre-prod environment.  The data is supplied by external
vendors, introduced into our system through a Kafka Streams ETL pipeline,
and consumed by our end-applications.  Generally we would only like to run
the ETL pipeline in prod since there is an operational cost to running it
in both prod and pre-prod (the data sometimes needs manual attention).
This seems to fit MM2 well: pre-prod end-applications consume from the
pre-prod Kafka cluster, which is entirely "remote" topics being mirrored
from the prod cluster.  We only have to keep one instance of the ETL
pipeline running, but end-applications can be separate, connecting to their
respective prod and pre-prod Kafka clusters.

However, when we want to test changes to the ETL pipeline itself, we would
like to turn off the mirroring from prod to pre-prod, and run the ETL
pipeline also in pre-prod, picking up the most recent state of the prod
pipeline from when mirroring was turned off (FWIW, downtime is not an issue
for our use case).

My question/concern is basically, can Streams apps work when they're
running against topics prepended with a cluster alias, like
"pre-prod.App-statestore-changelog" as is the plan with MM2. From what I
can tell the answer is no, and my proposal would be to give the Streams
user more specific control over how Streams names its internal topics
(repartition and changelogs) by defining an "InternalTopicNamingStrategy"
or similar.  Perhaps there is a solution on the MM2 side as well, but it
seems much less desirable to budge on that convention.

I phrased the question in terms of my team's problem, but it's worth noting
that this use case is passably similar to a potential DR use case, where
there is a DR cluster that is normally just being mirrored to by MM2, but
in a DR scenario would become the active cluster that Streams applications
are connected to.

Thanks for considering this issue, and great job to those working on MM2 so
far!

Paul

Reply via email to