Hi Paul, Thanks for the email. This does seem like a good setup to support.
This might seem a little low-fi, but do you think it would work to handle this use case entirely in "user space"? I may be missing something because this is off the cuff... In the code for your Streams app, I'm wondering if you can prepend your input/output topics with a config-driven string like: builder.stream(config.getEnvPrefix() + "my-input-topic") Regarding internal topics, I think the issue might be more complicated than just naming. I'm assuming you wish to load the changelog into the pre-prod app so that it can just "restore" the prod app's state and continue processing from there. But, the prod app will still be running, and its changelog will still be mirrored into pre-prod when you start the pre-prod app. Then, you'd basically have both prod and pre-prod writing into the pre-prod changelog at the same time. This seems likely to produce undesirable behavior. Also, the pre-prod app won't be in the same consumer group as the prod app, so it won't know from what offset to start processing input. It will load newer changelog state from prod and then start processing older events, probably producing different results from production anyway. If you can constrain the testing effort to be limited to only the mirrored "external" topics, I think you'll get more predictable results. But as I noted, this is off the cuff. Please let me know if I've overlooked something. Thanks, -John On Sun, Mar 24, 2019 at 12:31 PM Paul Whalen <pgwha...@gmail.com> wrote: > Hi all, > > With MirrorMaker 2.0 ( > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 > ) > accepted and coming along very nicely in development, it has got me > wondering if a certain use case is supported, and if not, can changes be > made to Streams or MM2 to support it. I'll explain the use case, but the > TL;DR here is "do we need more control over topic naming in MM2 or > Streams?" > > My team foresees using MM2 as a way to mirror data from our prod > environment to a pre-prod environment. The data is supplied by external > vendors, introduced into our system through a Kafka Streams ETL pipeline, > and consumed by our end-applications. Generally we would only like to run > the ETL pipeline in prod since there is an operational cost to running it > in both prod and pre-prod (the data sometimes needs manual attention). > This seems to fit MM2 well: pre-prod end-applications consume from the > pre-prod Kafka cluster, which is entirely "remote" topics being mirrored > from the prod cluster. We only have to keep one instance of the ETL > pipeline running, but end-applications can be separate, connecting to their > respective prod and pre-prod Kafka clusters. > > However, when we want to test changes to the ETL pipeline itself, we would > like to turn off the mirroring from prod to pre-prod, and run the ETL > pipeline also in pre-prod, picking up the most recent state of the prod > pipeline from when mirroring was turned off (FWIW, downtime is not an issue > for our use case). > > My question/concern is basically, can Streams apps work when they're > running against topics prepended with a cluster alias, like > "pre-prod.App-statestore-changelog" as is the plan with MM2. From what I > can tell the answer is no, and my proposal would be to give the Streams > user more specific control over how Streams names its internal topics > (repartition and changelogs) by defining an "InternalTopicNamingStrategy" > or similar. Perhaps there is a solution on the MM2 side as well, but it > seems much less desirable to budge on that convention. > > I phrased the question in terms of my team's problem, but it's worth noting > that this use case is passably similar to a potential DR use case, where > there is a DR cluster that is normally just being mirrored to by MM2, but > in a DR scenario would become the active cluster that Streams applications > are connected to. > > Thanks for considering this issue, and great job to those working on MM2 so > far! > > Paul >