Thank you for your replies Guozhang and Neha, though I have some followup questions.
I wrote my own Java Consumer and Producer based off of the Kafka Producer API and High Level Consumer. Let's call them MyConsumer and MyProducer. MyProducer uses a custom Partitioner class called SimplePartitioner. In the producer.config file that I specify when I run the MirrorMaker from the command line, there is a parameter "partitioner.class". I keep getting "ClassDefNotFoundException exceptions, no matter if I put the absolute path to my SimplePartitioner.class file, a relative path, or even when I add SimplePartitioner.class to the $CLASSPATH variables created in the kafka-run-class.sh script. Here is my output error: Exception in thread "main" java.lang.ClassNotFoundException: SimplePartitioner.class at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at kafka.utils.Utils$.createObject(Utils.scala:438) at kafka.producer.Producer.<init>(Producer.scala:60) at kafka.tools.MirrorMaker$$anonfun$1.apply(MirrorMaker.scala:116) at kafka.tools.MirrorMaker$$anonfun$1.apply(MirrorMaker.scala:106) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.immutable.Range.foreach(Range.scala:81) at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) at scala.collection.immutable.Range.map(Range.scala:46) at kafka.tools.MirrorMaker$.main(MirrorMaker.scala:106) at kafka.tools.MirrorMaker.main(MirrorMaker.scala) What is the correct value for the "partitioner.class" parameter in my producer.properties config file? Guozhang, in your reply to my original message you said "When the consumer of the MM gets a message, put the message to the producer's queue...". This seems to imply that I can specify my own custom Consumer and Producer when I run the Mirrormaker. How can I do this? Or, if I'm understand incorrectly and I have to use whichever default consumer/producer the Mirrormaker uses, how can I get that consumer to learn which partition it's reading from, pass that info to the producer, and then specify that partition ID when the producer rights to the target cluster? -Alex On Wed, Nov 26, 2014 at 10:33 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hello Alex, > > This can be done by doing some tweaks in the MM code (with the 0.8.2 new > producer). > > 1. Set-up your MM to have the total # of producers equal to the #. of > partitions in source / target cluster. > > 2. When the consumer of the MM gets a message, put the message to the > producer's queue based on its partition id; i.e. if the partition id is n, > put to n's producer queue. > > 3. When producer sends the data, specify the partition id; so each producer > will only send to a single partition. > > Guozhang > > > On Tue, Nov 25, 2014 at 8:19 PM, Alex Melville <amelvi...@g.hmc.edu> > wrote: > > > Howdy friends, > > > > > > I'd like to mirror the topics on several clusters to a central cluster, > and > > I'm looking at using the default Mirrormaker to do so. I've already done > > some basic testing on the Mirrormaker found here: > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 > > > > and managed to successfully copy a topic's partitions on a source cluster > > to a topic on a target cluster. So I'm able to mirror correctly. However > > for my particular use case I need to ensure that when I copy a topic's > > partitions from source cluster to target cluster, a partition created on > > the target cluster contains data in the exact same order as the data on > the > > corresponding partition on the source cluster. > > > > I'm thinking of writing a Simple Consumer so I can manually compare the > > events in a source cluster's partition with the corresponding partition > on > > the target cluster, but I'm not 100% sure if I'll be able to verify my > > guarantee if I do it this way. Can anyone here verify that partitions > > copied over to the target cluster by the default Mirrormaker are an exact > > copy of those on the source cluster? > > > > > > Thanks in advance, > > > > Alex Melville > > > > > > -- > -- Guozhang >