Hi Paul,

No problem! And please let us know how it goes.

Thanks,
-John

On Wed, Mar 27, 2019 at 9:13 PM Paul Whalen <pgwha...@gmail.com> wrote:

> John,
>
> You make a good case for it already being a public API, so my nerves are
> definitely eased on that front. I do think we have a path to move forward
> with the user space solution, and if I get a chance, I'm going to try
> proving it out with a trivial demo using an early MM2 build - but it's nice
> to hear your support of the use case regardless.  The ACL concern makes a
> lot of sense, and while I don't think it would be a deal breaker because of
> what you say about advanced control naturally requiring extra care, I'm
> generally against the added complexity of custom topic naming unless we
> really need it.  It looks like MM2 will also support optional ACL
> mirroring, so that should only make things easier.
>
> Regarding the management burden of doing these switchovers: fortunately our
> case is something like running in pre-prod maybe 3 consecutive days out of
> the month, and just prod for the rest of the month.  So if it wasn't the
> most effortless or fast process we could tolerate it.  Though if it was
> easy I wouldn't be surprised if others wanted a similar workflow with much
> faster iteration - spinning up a new environment with the same data as prod
> is always a boon.
>
> Thanks again!
> Paul
>
> On Wed, Mar 27, 2019 at 2:17 PM John Roesler <j...@confluent.io> wrote:
>
> > Hi Paul,
> >
> > Sorry for overlooking the "offset translation" MM2 feature. I'm glad
> > Ryanne was able to confirm this would work.
> >
> > I'm just one voice, but FWIW, I think that the internal topic naming
> > scheme is a public API. We document the structure of the naming
> > scheme in several places. We also recommend making use of the fact
> > that the applicationId is a prefix of the topic name in conjunction with
> > Kafka Broker ACLs to grant access to the internal topics to the
> > applications that own them.
> >
> > Actually, for this latter reason, I'm concerned that giving more control
> > over the names of internal topics might make topic security and
> > access control more difficult. Or maybe this concern is off-base, and
> > folks who take advanced control over the topic name would also take
> > on the responsibility to make sure their naming scheme works in
> > conjunction with their broker configs.
> >
> > For whatever reason, I hadn't considered prefixing the application's
> > id with "pre-prod.". Offhand, I think this would achieve the desired
> > outcome. There may be some devil in the details, of course.
> >
> >
> > Glad to hear, by the way, that you've already considered the problem
> > of concurrent modifications to the changelogs (etc.). It sounds like
> > your plan should work, although it might become a management burden
> > if you start wanting to run a lot of these stream-app tests. In that
> case,
> > you could consider mirroring the relevant topics *again* into a
> > test-specific
> > prefix (like "pre-prod.test-1.", up to some point. Then, you could stop
> > the mirror, run the test, verify the results, and then just delete the
> > whole test dataset.
> >
> >
> > Does it seem like you have a good path forward? From what I'm
> > hearing, the "user-space" approach is at least worth exploring before
> > considering a new API. Of course, if it doesn't pan out for whatever
> > reason,
> > I'd (personally) support adding whatever features are necessary to
> support
> > your use case.
> >
> > Thanks,
> > -John
> >
> >
> >
> > On Mon, Mar 25, 2019 at 9:40 PM Paul Whalen <pgwha...@gmail.com> wrote:
> >
> > > John and Ryanne,
> > >
> > > Thanks for the responses! I think Ryanne's way of describing the
> question
> > > is actually a much better summary than my long winded description: "a
> > > Streams app can switch between topics with and without a cluster alias
> > > prefix when you migrate between prod and pre-prod, while preserving
> > state."
> > >
> > > To address a few of John's points...
> > >
> > > But, the prod app will still be running, and its changelog will still
> be
> > > > mirrored into pre-prod when you start the pre-prod app.
> > > >
> > > The idea is actually to turn off the mirroring from prod to pre-prod
> > during
> > > this period, so the environments can operate completely independently
> and
> > > their state can comfortably diverge during the testing period.  After
> the
> > > testing period we'd be happy to throw away everything in pre-prod and
> > start
> > > mirroring again from prod with a blank slate.
> > >
> > > Also, the pre-prod app won't be in the same consumer group as the prod
> > app,
> > > > so it won't know from what offset to start processing input.
> > > >
> > > This is where I'm hoping the magic of MM2 will come in - at the time we
> > > shut off mirroring from prod to pre-prod in order to spin of the
> pre-prod
> > > environment, we will do an "offset translation" with RemoteClusterUtils
> > > like Ryanne mentioned, so new Streams apps in pre-prod will see
> consumer
> > > offsets that make sense for reading from pre-prod topics.
> > >
> > > I like both of your ideas around the "user space" solution: subscribing
> > to
> > > multiple topics, or choosing a topic based on config.  However, in
> order
> > to
> > > populate their internal state properly, when the pre-prod apps come up
> > they
> > > will need to look for repartition and changelog topics with the right
> > > prefix.  This seems problematic to me since the user doesn't have
> direct
> > > control over those topic names, though it did just occur to me now that
> > the
> > > user *sort of* does.  Since the naming scheme is currently just
> > > applicationId + "-" + storeName + "-changelog", we could translate the
> > > consumer group offsets to a consumer group with a new name that has the
> > > same prefix as the mirrored topics do.  That seems a bit clumsly/lucky
> to
> > > me (is the internal topic naming convention really a "public API"?),
> but
> > I
> > > think it would work.
> > >
> > > I'd be curious to hear if folks think that solution would work and be
> an
> > > acceptable pattern, since my original proposal of more user control of
> > > internal topic naming did seem a bit heavy handed.
> > >
> > > Thanks very much for your help!
> > > Paul
> > >
> > > On Mon, Mar 25, 2019 at 3:14 PM Ryanne Dolan <ryannedo...@gmail.com>
> > > wrote:
> > >
> > > > Hey Paul, thanks for the kind words re MM2.
> > > >
> > > > I'm not a Streams expert first off, but I think I understand your
> > > question:
> > > > if a Streams app can switch between topics with and without a cluster
> > > alias
> > > > prefix when you migrate between prod and pre-prod, while preserving
> > > state.
> > > > Streams supports regexes and lists of topics as input, so you can use
> > > e.g.
> > > > builder.stream(List.of("topic1", "prod.topic1")), which is a good
> place
> > > to
> > > > start. In this case, the combined subscription is still a single
> > stream,
> > > > conceptually, but comprises partitions from both topics, i.e.
> > partitions
> > > > from topic1 plus partitions from prod.topic1. At a high level, this
> is
> > no
> > > > different than adding more partitions to a single topic. I think any
> > > > intermediate or downstream topics/tables would remain unchanged,
> since
> > > they
> > > > are still the result of this single stream.
> > > >
> > > > The trick is to correctly translate offsets for the input topics when
> > > > migrating the app between prod and pre-prod, which RemoteClusterUtils
> > can
> > > > help with. You could do this with external tooling, e.g. a script
> > > > leveraging RemoteClusterUtils and
> kafka-streams-application-reset.sh. I
> > > > haven't tried this with a Streams app myself, but I suspect it would
> > > work.
> > > >
> > > > Ryanne
> > > >
> > > >
> > > > On Sun, Mar 24, 2019 at 12:31 PM Paul Whalen <pgwha...@gmail.com>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > With MirrorMaker 2.0 (
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > > > > )
> > > > > accepted and coming along very nicely in development, it has got me
> > > > > wondering if a certain use case is supported, and if not, can
> changes
> > > be
> > > > > made to Streams or MM2 to support it.  I'll explain the use case,
> but
> > > the
> > > > > TL;DR here is "do we need more control over topic naming in MM2 or
> > > > > Streams?"
> > > > >
> > > > > My team foresees using MM2 as a way to mirror data from our prod
> > > > > environment to a pre-prod environment.  The data is supplied by
> > > external
> > > > > vendors, introduced into our system through a Kafka Streams ETL
> > > pipeline,
> > > > > and consumed by our end-applications.  Generally we would only like
> > to
> > > > run
> > > > > the ETL pipeline in prod since there is an operational cost to
> > running
> > > it
> > > > > in both prod and pre-prod (the data sometimes needs manual
> > attention).
> > > > > This seems to fit MM2 well: pre-prod end-applications consume from
> > the
> > > > > pre-prod Kafka cluster, which is entirely "remote" topics being
> > > mirrored
> > > > > from the prod cluster.  We only have to keep one instance of the
> ETL
> > > > > pipeline running, but end-applications can be separate, connecting
> to
> > > > their
> > > > > respective prod and pre-prod Kafka clusters.
> > > > >
> > > > > However, when we want to test changes to the ETL pipeline itself,
> we
> > > > would
> > > > > like to turn off the mirroring from prod to pre-prod, and run the
> ETL
> > > > > pipeline also in pre-prod, picking up the most recent state of the
> > prod
> > > > > pipeline from when mirroring was turned off (FWIW, downtime is not
> an
> > > > issue
> > > > > for our use case).
> > > > >
> > > > > My question/concern is basically, can Streams apps work when
> they're
> > > > > running against topics prepended with a cluster alias, like
> > > > > "pre-prod.App-statestore-changelog" as is the plan with MM2. From
> > what
> > > I
> > > > > can tell the answer is no, and my proposal would be to give the
> > Streams
> > > > > user more specific control over how Streams names its internal
> topics
> > > > > (repartition and changelogs) by defining an
> > > "InternalTopicNamingStrategy"
> > > > > or similar.  Perhaps there is a solution on the MM2 side as well,
> but
> > > it
> > > > > seems much less desirable to budge on that convention.
> > > > >
> > > > > I phrased the question in terms of my team's problem, but it's
> worth
> > > > noting
> > > > > that this use case is passably similar to a potential DR use case,
> > > where
> > > > > there is a DR cluster that is normally just being mirrored to by
> MM2,
> > > but
> > > > > in a DR scenario would become the active cluster that Streams
> > > > applications
> > > > > are connected to.
> > > > >
> > > > > Thanks for considering this issue, and great job to those working
> on
> > > MM2
> > > > so
> > > > > far!
> > > > >
> > > > > Paul
> > > > >
> > > >
> > >
> >
>

Reply via email to