Hi Paul, No problem! And please let us know how it goes.
Thanks, -John On Wed, Mar 27, 2019 at 9:13 PM Paul Whalen <pgwha...@gmail.com> wrote: > John, > > You make a good case for it already being a public API, so my nerves are > definitely eased on that front. I do think we have a path to move forward > with the user space solution, and if I get a chance, I'm going to try > proving it out with a trivial demo using an early MM2 build - but it's nice > to hear your support of the use case regardless. The ACL concern makes a > lot of sense, and while I don't think it would be a deal breaker because of > what you say about advanced control naturally requiring extra care, I'm > generally against the added complexity of custom topic naming unless we > really need it. It looks like MM2 will also support optional ACL > mirroring, so that should only make things easier. > > Regarding the management burden of doing these switchovers: fortunately our > case is something like running in pre-prod maybe 3 consecutive days out of > the month, and just prod for the rest of the month. So if it wasn't the > most effortless or fast process we could tolerate it. Though if it was > easy I wouldn't be surprised if others wanted a similar workflow with much > faster iteration - spinning up a new environment with the same data as prod > is always a boon. > > Thanks again! > Paul > > On Wed, Mar 27, 2019 at 2:17 PM John Roesler <j...@confluent.io> wrote: > > > Hi Paul, > > > > Sorry for overlooking the "offset translation" MM2 feature. I'm glad > > Ryanne was able to confirm this would work. > > > > I'm just one voice, but FWIW, I think that the internal topic naming > > scheme is a public API. We document the structure of the naming > > scheme in several places. We also recommend making use of the fact > > that the applicationId is a prefix of the topic name in conjunction with > > Kafka Broker ACLs to grant access to the internal topics to the > > applications that own them. > > > > Actually, for this latter reason, I'm concerned that giving more control > > over the names of internal topics might make topic security and > > access control more difficult. Or maybe this concern is off-base, and > > folks who take advanced control over the topic name would also take > > on the responsibility to make sure their naming scheme works in > > conjunction with their broker configs. > > > > For whatever reason, I hadn't considered prefixing the application's > > id with "pre-prod.". Offhand, I think this would achieve the desired > > outcome. There may be some devil in the details, of course. > > > > > > Glad to hear, by the way, that you've already considered the problem > > of concurrent modifications to the changelogs (etc.). It sounds like > > your plan should work, although it might become a management burden > > if you start wanting to run a lot of these stream-app tests. In that > case, > > you could consider mirroring the relevant topics *again* into a > > test-specific > > prefix (like "pre-prod.test-1.", up to some point. Then, you could stop > > the mirror, run the test, verify the results, and then just delete the > > whole test dataset. > > > > > > Does it seem like you have a good path forward? From what I'm > > hearing, the "user-space" approach is at least worth exploring before > > considering a new API. Of course, if it doesn't pan out for whatever > > reason, > > I'd (personally) support adding whatever features are necessary to > support > > your use case. > > > > Thanks, > > -John > > > > > > > > On Mon, Mar 25, 2019 at 9:40 PM Paul Whalen <pgwha...@gmail.com> wrote: > > > > > John and Ryanne, > > > > > > Thanks for the responses! I think Ryanne's way of describing the > question > > > is actually a much better summary than my long winded description: "a > > > Streams app can switch between topics with and without a cluster alias > > > prefix when you migrate between prod and pre-prod, while preserving > > state." > > > > > > To address a few of John's points... > > > > > > But, the prod app will still be running, and its changelog will still > be > > > > mirrored into pre-prod when you start the pre-prod app. > > > > > > > The idea is actually to turn off the mirroring from prod to pre-prod > > during > > > this period, so the environments can operate completely independently > and > > > their state can comfortably diverge during the testing period. After > the > > > testing period we'd be happy to throw away everything in pre-prod and > > start > > > mirroring again from prod with a blank slate. > > > > > > Also, the pre-prod app won't be in the same consumer group as the prod > > app, > > > > so it won't know from what offset to start processing input. > > > > > > > This is where I'm hoping the magic of MM2 will come in - at the time we > > > shut off mirroring from prod to pre-prod in order to spin of the > pre-prod > > > environment, we will do an "offset translation" with RemoteClusterUtils > > > like Ryanne mentioned, so new Streams apps in pre-prod will see > consumer > > > offsets that make sense for reading from pre-prod topics. > > > > > > I like both of your ideas around the "user space" solution: subscribing > > to > > > multiple topics, or choosing a topic based on config. However, in > order > > to > > > populate their internal state properly, when the pre-prod apps come up > > they > > > will need to look for repartition and changelog topics with the right > > > prefix. This seems problematic to me since the user doesn't have > direct > > > control over those topic names, though it did just occur to me now that > > the > > > user *sort of* does. Since the naming scheme is currently just > > > applicationId + "-" + storeName + "-changelog", we could translate the > > > consumer group offsets to a consumer group with a new name that has the > > > same prefix as the mirrored topics do. That seems a bit clumsly/lucky > to > > > me (is the internal topic naming convention really a "public API"?), > but > > I > > > think it would work. > > > > > > I'd be curious to hear if folks think that solution would work and be > an > > > acceptable pattern, since my original proposal of more user control of > > > internal topic naming did seem a bit heavy handed. > > > > > > Thanks very much for your help! > > > Paul > > > > > > On Mon, Mar 25, 2019 at 3:14 PM Ryanne Dolan <ryannedo...@gmail.com> > > > wrote: > > > > > > > Hey Paul, thanks for the kind words re MM2. > > > > > > > > I'm not a Streams expert first off, but I think I understand your > > > question: > > > > if a Streams app can switch between topics with and without a cluster > > > alias > > > > prefix when you migrate between prod and pre-prod, while preserving > > > state. > > > > Streams supports regexes and lists of topics as input, so you can use > > > e.g. > > > > builder.stream(List.of("topic1", "prod.topic1")), which is a good > place > > > to > > > > start. In this case, the combined subscription is still a single > > stream, > > > > conceptually, but comprises partitions from both topics, i.e. > > partitions > > > > from topic1 plus partitions from prod.topic1. At a high level, this > is > > no > > > > different than adding more partitions to a single topic. I think any > > > > intermediate or downstream topics/tables would remain unchanged, > since > > > they > > > > are still the result of this single stream. > > > > > > > > The trick is to correctly translate offsets for the input topics when > > > > migrating the app between prod and pre-prod, which RemoteClusterUtils > > can > > > > help with. You could do this with external tooling, e.g. a script > > > > leveraging RemoteClusterUtils and > kafka-streams-application-reset.sh. I > > > > haven't tried this with a Streams app myself, but I suspect it would > > > work. > > > > > > > > Ryanne > > > > > > > > > > > > On Sun, Mar 24, 2019 at 12:31 PM Paul Whalen <pgwha...@gmail.com> > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > With MirrorMaker 2.0 ( > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 > > > > > ) > > > > > accepted and coming along very nicely in development, it has got me > > > > > wondering if a certain use case is supported, and if not, can > changes > > > be > > > > > made to Streams or MM2 to support it. I'll explain the use case, > but > > > the > > > > > TL;DR here is "do we need more control over topic naming in MM2 or > > > > > Streams?" > > > > > > > > > > My team foresees using MM2 as a way to mirror data from our prod > > > > > environment to a pre-prod environment. The data is supplied by > > > external > > > > > vendors, introduced into our system through a Kafka Streams ETL > > > pipeline, > > > > > and consumed by our end-applications. Generally we would only like > > to > > > > run > > > > > the ETL pipeline in prod since there is an operational cost to > > running > > > it > > > > > in both prod and pre-prod (the data sometimes needs manual > > attention). > > > > > This seems to fit MM2 well: pre-prod end-applications consume from > > the > > > > > pre-prod Kafka cluster, which is entirely "remote" topics being > > > mirrored > > > > > from the prod cluster. We only have to keep one instance of the > ETL > > > > > pipeline running, but end-applications can be separate, connecting > to > > > > their > > > > > respective prod and pre-prod Kafka clusters. > > > > > > > > > > However, when we want to test changes to the ETL pipeline itself, > we > > > > would > > > > > like to turn off the mirroring from prod to pre-prod, and run the > ETL > > > > > pipeline also in pre-prod, picking up the most recent state of the > > prod > > > > > pipeline from when mirroring was turned off (FWIW, downtime is not > an > > > > issue > > > > > for our use case). > > > > > > > > > > My question/concern is basically, can Streams apps work when > they're > > > > > running against topics prepended with a cluster alias, like > > > > > "pre-prod.App-statestore-changelog" as is the plan with MM2. From > > what > > > I > > > > > can tell the answer is no, and my proposal would be to give the > > Streams > > > > > user more specific control over how Streams names its internal > topics > > > > > (repartition and changelogs) by defining an > > > "InternalTopicNamingStrategy" > > > > > or similar. Perhaps there is a solution on the MM2 side as well, > but > > > it > > > > > seems much less desirable to budge on that convention. > > > > > > > > > > I phrased the question in terms of my team's problem, but it's > worth > > > > noting > > > > > that this use case is passably similar to a potential DR use case, > > > where > > > > > there is a DR cluster that is normally just being mirrored to by > MM2, > > > but > > > > > in a DR scenario would become the active cluster that Streams > > > > applications > > > > > are connected to. > > > > > > > > > > Thanks for considering this issue, and great job to those working > on > > > MM2 > > > > so > > > > > far! > > > > > > > > > > Paul > > > > > > > > > > > > > > >