Hi Ryanne

I think we have inconsistent definitions of Active-Active. The producer is
only producing to one cluster (primary) and one topic (topic "table"), and
the other cluster (secondary) contains only a replication of the data via
MM2 ("primary.table"). What you seemed to be proposing is that the
producer's "table" data is sent fully to each cluster, such that the state
can be materialized as a KTable in each application running on each
cluster. This wouldn't require MM2 at all, so I'm not sure if this is what
you advocated.

You also state that "As with normal consumers, the Streams app should
*subscribe
to any remote topics*, e.g. with a regex, s.t. the application state will
reflect input from either source cluster.". Wouldn't this mean that the
stateful "table" topic that we wish to materialize would be replicated by
MM2 from Primary, such that we end up with the following:

*Replicated Entity/Stateful Data:*
*Primary Cluster: (Live)*
Topic: "table" (contains data from T = 0 to T = n)
SteamsAppPrimary is consuming from ("table")

*Secondary Cluster: (Live)*
Topic: "primary.table" (contains data from T = 0 to T = n)
SteamsAppSecondary is consuming from ("primary.table")

What does StreamsAppSecondary do when "primary.table" is no longer
replicated because Primary has died? Additionally, where should the
producer of topic "table" now write its data to, assuming that Primary
Cluster is irrevocably lost?

I hope this better outlines my scenario. The trivial solution seems to be
to make your producers produce all stateful data (topic "table") to each
cluster, which makes MM2 unnecessary, but can also lead to data
inconsistencies so it's not exactly foolproof.

Thanks

On Mon, Jul 22, 2019 at 6:32 PM Ryanne Dolan <ryannedo...@gmail.com> wrote:

> Hello Adam, thanks for the questions. Yes my organization uses Streams,
> and yes you can use Streams with MM2/KIP-382, though perhaps not in the way
> you are describing.
>
> The architecture you mention is more "active/standby" than "active/active"
> IMO. The "secondary" cluster is not being used until a failure, at which
> point you migrate your app and expect the data to already be there. This
> works for normal consumers where you can seek() and --reset-offsets.
> Streams apps can be reset with the kafka-streams-application-reset tool,
> but as you point out, that doesn't help with rebuilding an app's internal
> state, which would be missing on the secondary cluster. (Granted, that may
> be okay depending on your particular application.)
>
> A true "active/active" solution IMO would be to run your same Streams app
> in _both_ clusters (primary, secondary), s.t. the entire application state
> is available and continuously updated in both clusters. As with normal
> consumers, the Streams app should subscribe to any remote topics, e.g. with
> a regex, s.t. the application state will reflect input from either source
> cluster.
>
> This is essentially what Streams' "standby replicas" are -- extra copies
> of application state to support quicker failover. Without these replicas,
> Streams would need to start back at offset 0 and re-process everything in
> order to rebuild state (which you don't want to do during a disaster,
> especially!). The same logic applies to using Streams with MM2. You _could_
> failover by resetting the app and rebuilding all the missing state, or you
> could have a copy of everything sitting there ready when you need it. The
> easiest way to do the latter is to run your app in both clusters.
>
> Hope that helps.
>
> Ryanne
>
> On Mon, Jul 22, 2019 at 3:11 PM Adam Bellemare <adam.bellem...@gmail.com>
> wrote:
>
>> Hi Ryanne
>>
>> I have a quick question for you about Active+Active replication and Kafka
>> Streams. First, does your org /do you use Kafka Streams? If not then I
>> think this conversation can end here. ;)
>>
>> Secondly, and for the broader Kafka Dev group - what happens if I want to
>> use Active+Active replication with my Kafka Streams app, say, to
>> materialize a simple KTable? Based on my understanding, I topic "table" on
>> the primary cluster will be replicated to the secondary cluster as
>> "primary.table". In the case of a full cluster failure for primary, the
>> producer to topic "table" on the primary switches over to the secondary
>> cluster, creates its own "table" topic and continues to write to there. So
>> now, assuming we have had no data loss, we end up with:
>>
>>
>> *Primary Cluster: (Dead)*
>>
>>
>> *Secondary Cluster: (Live)*
>> Topic: "primary.table" (contains data from T = 0 to T = n)
>> Topic: "table" (contains data from T = n+1 to now)
>>
>> If I want to materialize state from using Kafka Streams, obviously I am
>> now in a bit of a pickle since I need to consume "primary.table" before I
>> consume "table". Have you encountered rebuilding state in Kafka Streams
>> using Active-Active? For non-Kafka Streams I can see using a single
>> consumer for "primary.table" and one for "table", interleaving the
>> timestamps and performing basic event dispatching based on my own tracked
>> stream-time, but for Kafka Streams I don't think there exists a solution to
>> this.
>>
>> If you have any thoughts on this or some recommendations for Kafka
>> Streams with Active-Active I would be very appreciative.
>>
>> Thanks
>> Adam
>>
>>
>>

Reply via email to