If you are using old producer for mirror maker, you can specify a custom partitioner for the mirror maker producer which has exact same logic to partition message as your custom producer does. If you are using new java producer, currently there is no way to do it. We are working on adding a message handler to mirror maker, after that you may use the message handler to specify which partition you want to send each message to.
In terms of verification, I think consuming all the messages and compare them probably is still necessary for a strong guarantee. I don¹t think we have tools available for data verification. -Jiangjie (Becket) Qin On 2/21/15, 4:18 PM, "Alex Melville" <amelvi...@g.hmc.edu> wrote: >Howdy Kafka Team, > > >We are trying to aggregate every topic on different geo-separate clusters >all into one central kafka cluster. We have the guarantee that the number >of partitions for a given topic will be the same on the source and target >clusters. Due to our particular use case, we need to make sure that the >ordering of the events in any given partition on a source cluster is in >exactly the same order on the corresponding partition in the target >cluster. > >So far we've use our custom producer to push messages that use a String >key >and byte[] message type to the source cluster. But when we go to use the >Mirrormaker to copy from the source to the target cluster, if we use the >same partitioner that our custom producer uses then we get an error >saying "[B >cannot be cast to java.lang.String". We understand this to mean that the >MM >consumer is trying to partition the source cluster's data using a String >key, but since the message residing on the source cluster is in byte[] >form, using a String key makes no sense. However we need the producer that >pushes to the target cluster to use the exact same partitioning scheme our >custom producer used, so that the ordering on the source and target >partitions is exactly the same. How can we ensure this? > > >Once we have correctly mirrored exactly ordered partitions, what is the >best way to verify that the source and target partitions do store messages >in the exact same order? Right now we are thinking about writing a >SimpleConsumer that iterates through the logs of source and target >partition, comparing them to each other as the iteration ensues, but it'd >be nice if there was an existing tool for doing this, or if could have >some >guarantee that the MM will retain partition ordering by default. > > >Cheers, > > >Alex Melville