Thanks Jun, I've updated the KIP as requested. Brief notes below: 100. added "...out-of-the-box (without custom handlers)..."
101. done. Good idea to include a MessageFormatter. 102. done. > 103. [...] why is Heartbeat a separate connector? Heartbeats themselves are replicated via MirrorSource/SinkConnector, so if replication stops, you'll stop seeing heartbeats in downstream clusters. I've updated the KIP to make this clearer and have added a bullet to Rejected Alternatives. 104. added "heartbeat.retention.ms", "checkpoint.retention.ms", thanks. The heartbeat topic doesn't need to be compacted. > 105. [...] I am not sure why targetClusterAlias is useful In order to map A's B.topic1 to B's topic1, we need to know B. > 106. [...] should the following properties be prefixed with "consumer." No, they are part of Connect's worker config. > 107. So, essentially it's running multiple logical connect clusters on the same shared worker nodes? Correct. Rather than configure each Connector and Worker and Herder individually, a single top-level configuration file is used. And instead of running a bunch of separate worker processes on each node, a single process runs multiple workers. This is implemented using a separate driver based on ConnectDistributed, but which runs multiple DistributedHerders. Each DistributedHerder uses a different Kafka cluster for coordination -- they are completely separate apart from running in the same process. Thanks for helping improve the doc! Ryanne On Fri, Jan 4, 2019 at 10:33 AM Jun Rao <j...@confluent.io> wrote: > Hi, Ryanne, > > Thanks for KIP. Still have a few more comments below. > > 100. "This is not possible with MirrorMaker today -- records would be > replicated back and forth indefinitely, and the topics in either cluster > would be merged inconsistently between clusters. " This is not 100% true > since MM can do the topic renaming through MirrorMakerMessageHandler. > > 101. For both Heartbeat and checkpoint, could you define the full schema, > including the field type? Also how are they serialized into the Kafka > topic? Is it JSON or sth else? For convenience, it would be useful to > provide a built-in MessageFormatter so that one can read each topic's data > using tools like ConsoleConsumer. > > 102. For the public Heartbeat and Checkpoint class, could you list the > public methods in each class? > > 103. I am wondering why is Heartbeat a separate connector? A MirrorMaker > connector can die independent of the Heartbeat connector, which seems to > defeat the purpose of heartbeat. > > 104. Is the Heartbeat topic also a compacted topic? If not, how long is it > retained for? > > 105. For the following, I am not sure why targetClusterAlias is useful? The > checkpoint file seems to only include sourceClusterAlias. > > Map<TopicPartition, Long> translateOffsets(Map<?, ?> targetConsumerConfig, > String sourceClusterAlias, String targetClusterAlias, String remoteGroupId) > > 106. In the configuration example, should the following properties be > prefixed with "consumer."? > key.converter > <https://cwiki.apache.org/confluence/display/KAFKA/key.converter> = > org.apache.kafka.connect.converters.ByteArrayConverter > < > https://cwiki.apache.org/confluence/display/KAFKA/org.apache.kafka.connect.converters.ByteArrayConverter > > > value.converter > <https://cwiki.apache.org/confluence/display/KAFKA/value.converter> = > org.apache.kafka.connect.converters.ByteArrayConverter > < > https://cwiki.apache.org/confluence/display/KAFKA/org.apache.kafka.connect.converters.ByteArrayConverter > > > > 107. Could you add a bit more description on how connect-mirror-maker.sh is > implemented? My understanding is that it will start as many as > separate DistributedHerder as the Kafka clusters? So, essentially it's > running multiple logical connect clusters on the same shared worker nodes? > > Thanks, > > Jun > > > On Thu, Dec 20, 2018 at 5:23 PM Srinivas Reddy <srinivas96all...@gmail.com > > > wrote: > > > +1 (non binding) > > > > Thank you Ryan for the KIP, let me know if you need support in > implementing > > it. > > > > - > > Srinivas > > > > - Typed on tiny keys. pls ignore typos.{mobile app} > > > > > > On Fri, 21 Dec, 2018, 08:26 Ryanne Dolan <ryannedo...@gmail.com wrote: > > > > > Thanks for the votes so far! > > > > > > Due to recent discussions, I've removed the high-level REST API from > the > > > KIP. > > > > > > On Thu, Dec 20, 2018 at 12:42 PM Paul Davidson < > pdavid...@salesforce.com > > > > > > wrote: > > > > > > > +1 > > > > > > > > Would be great to see the community build on the basic approach we > took > > > > with Mirus. Thanks Ryanne. > > > > > > > > On Thu, Dec 20, 2018 at 9:01 AM Andrew Psaltis < > > psaltis.and...@gmail.com > > > > > > > > wrote: > > > > > > > > > +1 > > > > > > > > > > Really looking forward to this and to helping in any way I can. > > Thanks > > > > for > > > > > kicking this off Ryanne. > > > > > > > > > > On Thu, Dec 20, 2018 at 10:18 PM Andrew Otto <o...@wikimedia.org> > > > wrote: > > > > > > > > > > > +1 > > > > > > > > > > > > This looks like a huge project! Wikimedia would be very excited > to > > > have > > > > > > this. Thanks! > > > > > > > > > > > > On Thu, Dec 20, 2018 at 9:52 AM Ryanne Dolan < > > ryannedo...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hey y'all, please vote to adopt KIP-382 by replying +1 to this > > > > thread. > > > > > > > > > > > > > > For your reference, here are the highlights of the proposal: > > > > > > > > > > > > > > - Leverages the Kafka Connect framework and ecosystem. > > > > > > > - Includes both source and sink connectors. > > > > > > > - Includes a high-level driver that manages connectors in a > > > dedicated > > > > > > > cluster. > > > > > > > - High-level REST API abstracts over connectors between > multiple > > > > Kafka > > > > > > > clusters. > > > > > > > - Detects new topics, partitions. > > > > > > > - Automatically syncs topic configuration between clusters. > > > > > > > - Manages downstream topic ACL. > > > > > > > - Supports "active/active" cluster pairs, as well as any number > > of > > > > > active > > > > > > > clusters. > > > > > > > - Supports cross-data center replication, aggregation, and > other > > > > > complex > > > > > > > topologies. > > > > > > > - Provides new metrics including end-to-end replication latency > > > > across > > > > > > > multiple data centers/clusters. > > > > > > > - Emits offsets required to migrate consumers between clusters. > > > > > > > - Tooling for offset translation. > > > > > > > - MirrorMaker-compatible legacy mode. > > > > > > > > > > > > > > Thanks, and happy holidays! > > > > > > > Ryanne > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Paul Davidson > > > > Principal Engineer, Ajna Team > > > > Big Data & Monitoring > > > > > > > > > >