Cyrille, I don't see why using MM1/2 would break your isolation requirement. But if you can't mirror topics for some reason consider Flink instead of Kafka Streams.
Ryanne On Thu, Feb 13, 2020 at 10:52 AM Cyrille Karmann <cyri...@nnamrak.org> wrote: > Hello, > > We are trying to create a streaming pipeline of data between different > Kafka clusters. Our users send data to the input Kafka cluster, and we want > to process this data and send the result to topics on another Kafka > cluster. > > We have different reasons for this setup, but mainly it's for isolation: > the two clusters don't have to have the same configuration and the first > "input" Kafka cluster is critical: we want to be able to do maintenance on > the second cluster without impacting the first one. Also we have more than > a thousand topics on each side so managing them separately is easier. > > We are investigating different technologies for the processing part, and > Kafka Streams looked promising except it is apparently not supporting to > write in a different cluster as the one it is reading from. > > I saw people on forums suggesting to write in the first cluster and use > MirrorMaker to channel the data to the output cluster. This breaks our > isolation requirements and add more latency so we don't want to do that. > > I have two questions: > > - Is there a reason behind the constraint that Kafka Streams can not > produce to a different cluster? I see that Kafka Streams allow to specify > different configuration for the producer but it explicitly disallow it for > ProducerConfig.BOOTSTRAP_SERVERS_CONFIG so it definitely something the > developers did not want to support ( > > https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/StreamsConfig.html#getMainConsumerConfigs-java.lang.String-java.lang.String- > ) > but I am not clear why it is so. > > - At the same time, there is the KafkaClientSupplier mechanism that allows > to inject our own KafkaProducer. I was actually successful in injecting > such a KafkaProducer that connects to a different cluster. The fact that I > am able to do, using a not-very documented API, something that other parts > of the Kafka Streams library try to prevent me to do, makes me wonder if I > am breaking something while doing this? In particular one thing important > to me is exactly-once processing so I want to be sure it would still work. > > Thanks, > Cyrille >