Going back to your previous requirement of ensuring that the data in the target cluster is in the same order as the source cluster, all you need is to specify a key with every record in your data. The mirror maker and its producer takes care of placing all the data for a particular key in the same partition on the target cluster. Effectively, all your data will be in the same order (though there may be a few duplicates as I mentioned before).
Hope that helps! On Fri, Dec 5, 2014 at 1:23 PM, Alex Melville <amelvi...@g.hmc.edu> wrote: > Thank you for your replies Guozhang and Neha, though I have some followup > questions. > > I wrote my own Java Consumer and Producer based off of the Kafka Producer > API and High Level Consumer. Let's call them MyConsumer and MyProducer. > MyProducer uses a custom Partitioner class called SimplePartitioner. In the > producer.config file that I specify when I run the MirrorMaker from the > command line, there is a parameter "partitioner.class". I keep getting > "ClassDefNotFoundException exceptions, no matter if I put the absolute path > to my SimplePartitioner.class file, a relative path, or even when I add > SimplePartitioner.class to the $CLASSPATH variables created in the > kafka-run-class.sh script. Here is my output error: > > Exception in thread "main" java.lang.ClassNotFoundException: > SimplePartitioner.class > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at kafka.utils.Utils$.createObject(Utils.scala:438) > at kafka.producer.Producer.<init>(Producer.scala:60) > at kafka.tools.MirrorMaker$$anonfun$1.apply(MirrorMaker.scala:116) > at kafka.tools.MirrorMaker$$anonfun$1.apply(MirrorMaker.scala:106) > at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) > at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) > at scala.collection.immutable.Range.foreach(Range.scala:81) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) > at scala.collection.immutable.Range.map(Range.scala:46) > at kafka.tools.MirrorMaker$.main(MirrorMaker.scala:106) > at kafka.tools.MirrorMaker.main(MirrorMaker.scala) > > What is the correct value for the "partitioner.class" parameter in my > producer.properties config file? > > > > > Guozhang, in your reply to my original message you said "When the consumer > of the MM gets a message, put the message to the producer's queue...". This > seems to imply that I can specify my own custom Consumer and Producer when > I run the Mirrormaker. How can I do this? Or, if I'm understand incorrectly > and I have to use whichever default consumer/producer the Mirrormaker uses, > how can I get that consumer to learn which partition it's reading from, > pass that info to the producer, and then specify that partition ID when the > producer rights to the target cluster? > > > -Alex > > On Wed, Nov 26, 2014 at 10:33 AM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > Hello Alex, > > > > This can be done by doing some tweaks in the MM code (with the 0.8.2 new > > producer). > > > > 1. Set-up your MM to have the total # of producers equal to the #. of > > partitions in source / target cluster. > > > > 2. When the consumer of the MM gets a message, put the message to the > > producer's queue based on its partition id; i.e. if the partition id is > n, > > put to n's producer queue. > > > > 3. When producer sends the data, specify the partition id; so each > producer > > will only send to a single partition. > > > > Guozhang > > > > > > On Tue, Nov 25, 2014 at 8:19 PM, Alex Melville <amelvi...@g.hmc.edu> > > wrote: > > > > > Howdy friends, > > > > > > > > > I'd like to mirror the topics on several clusters to a central cluster, > > and > > > I'm looking at using the default Mirrormaker to do so. I've already > done > > > some basic testing on the Mirrormaker found here: > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 > > > > > > and managed to successfully copy a topic's partitions on a source > cluster > > > to a topic on a target cluster. So I'm able to mirror correctly. > However > > > for my particular use case I need to ensure that when I copy a topic's > > > partitions from source cluster to target cluster, a partition created > on > > > the target cluster contains data in the exact same order as the data on > > the > > > corresponding partition on the source cluster. > > > > > > I'm thinking of writing a Simple Consumer so I can manually compare the > > > events in a source cluster's partition with the corresponding partition > > on > > > the target cluster, but I'm not 100% sure if I'll be able to verify my > > > guarantee if I do it this way. Can anyone here verify that partitions > > > copied over to the target cluster by the default Mirrormaker are an > exact > > > copy of those on the source cluster? > > > > > > > > > Thanks in advance, > > > > > > Alex Melville > > > > > > > > > > > -- > > -- Guozhang > > > -- Thanks, Neha