Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-27 Thread Gwen Shapira
Make sense to me. On Fri, Feb 27, 2015 at 12:59 PM, Jiangjie Qin wrote: > I think it probably needs another KIP to discuss the command line tool > standardization because it is essentially a cross boad user interface > change. > For this specific KIP, I believe the scope is just to make sure we f

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-27 Thread Gwen Shapira
The biggest gap in tool standardization and MirrorMaker is the fact that MirrorMaker takes 2 configuration files as inputs with required parameters that can't be set on command line. If we are breaking back-compatibility, perhaps we can standardize this part too? On Fri, Feb 27, 2015 at 12:36 PM,

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-27 Thread Jay Kreps
Yeah it will break the existing usage but personally I think it is worth it to be standard across all our tools. -Jay On Fri, Feb 27, 2015 at 9:53 AM, Jiangjie Qin wrote: > Hi Jay, > > I just modified the KIP. The only concern I have about this change is that > it will break existing deployment

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-27 Thread Jiangjie Qin
Hi Jay, I just modified the KIP. The only concern I have about this change is that it will break existing deployments. And we need to change the command line arguments format for other tools as well. It is defiitely better that we conform to the unix standard. It is just I am not sure if the chang

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-26 Thread Jay Kreps
Can we change the command line arguments for mm to match the command line arguments elsewhere. This proposal seems to have two formats: *--consumer.rebalance.listener* and *--abortOnSendFail* The '.' separators for command line options predate this JIRA but I think the new camelCase option is a new

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-26 Thread Jiangjie Qin
KAFKA-1650 added this option, but we don’t have ―no.data.loss in any official release. On 2/26/15, 12:01 PM, "Gwen Shapira" wrote: >Did --no.data.loss exist in previous releases of irrorMaker? >If it does, maybe we want to keep it around for backward compatibility? >(i.e. so existing dployments

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-26 Thread Gwen Shapira
Did --no.data.loss exist in previous releases of MirrorMaker? If it does, maybe we want to keep it around for backward compatibility? (i.e. so existing deployments won't break)? Gwen On Thu, Feb 26, 2015 at 11:57 AM, Jiangjie Qin wrote: > Hi Neha, > > Thanks for the comment. That’s a really goo

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-26 Thread Jiangjie Qin
Hi Neha, Thanks for the comment. That’s a really good point. Originally I’m thinking about allowing user to tweak some parameter as needed. For example, some user might want to have pipeline enabled and can tolerate reordering, some user might want to use acks=1 or acks=0, some might want to move

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-26 Thread Neha Narkhede
Hey Becket, The KIP proposes addition of a --no.data.loss command line option to the MirrorMaker. Though when would the user not want that option? I'm wondering what the benefit of providing that option is if every user would want that for correct mirroring behavior. Other than that, the KIP look

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-25 Thread Jiangjie Qin
For 1), the current design allow you to do it. The customizable message handler takes in a ConsumerRecord and spit a List, you can just put a topic for the ProducerRecord different from ConsumerRecord. WRT performance, we did some test in LinkedIn, the performance looks good to us. Jiangjie (Beck

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-25 Thread Bhavesh Mistry
Hi Jiangjie, It might be too late. But, I wanted to bring-up following use case for adopting new MM: 1) Ability to publish message from src topic to different destination topic via --overidenTopics=srcTopic:newDestinationTopic In order to adopt, new MM enhancement customer will compare performa

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-24 Thread Jiangjie Qin
Sure! Just created the voting thread :) On 2/24/15, 4:44 PM, "Jay Kreps" wrote: >Hey Jiangjie, > >Let's do an official vote so that we know what we are voting on and we are >crisp on what the outcome was. This thread is very long :- > >-Jay > >On Tue, Feb 24, 2015 at 2:53 PM, Jiangjie Qin >wrot

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-24 Thread Jay Kreps
Hey Jiangjie, Let's do an official vote so that we know what we are voting on and we are crisp on what the outcome was. This thread is very long :-) -Jay On Tue, Feb 24, 2015 at 2:53 PM, Jiangjie Qin wrote: > I updated the KIP page based on the discussion we had. > > Should I launch another vo

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-24 Thread Jiangjie Qin
I forgot to mention that we have tested the entire design in our test environment. The performance looks fine. Jiangjie (Becket) Qin On 2/24/15, 2:53 PM, "Jiangjie Qin" wrote: >I updated the KIP page based on the discussion we had. > >Should I launch another vote or we can think of this mail th

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-24 Thread Jiangjie Qin
I updated the KIP page based on the discussion we had. Should I launch another vote or we can think of this mail thread has already included a vote? Jiangjie (Becket) Qin On 2/11/15, 5:15 PM, "Neha Narkhede" wrote: >Thanks for the explanation, Joel! Would love to see the results of the >throug

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-11 Thread Neha Narkhede
Thanks for the explanation, Joel! Would love to see the results of the throughput experiment and I'm a +1 on everything else, including the rebalance callback and record handler. -Neha On Wed, Feb 11, 2015 at 1:13 PM, Jay Kreps wrote: > Cool, I agree with all that. > > I agree about the need fo

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-11 Thread Jay Kreps
Cool, I agree with all that. I agree about the need for a rebalancing callback. Totally agree about record handler. It would be great to see if a prototype of this is workable. Thanks guys! -Jay On Wed, Feb 11, 2015 at 12:36 PM, Joel Koshy wrote: > Hey Jay, > > Guozhang, Becket and I got to

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-11 Thread Joel Koshy
Hey Jay, Guozhang, Becket and I got together to discuss this and we think: - It seems that your proposal based on the new consumer and flush call should work. - We would likely need to call the poll with a timeout that matches the offset commit interval in order to deal with low volume mirr

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-11 Thread Jay Kreps
Guozhang, I agree with 1-3, I do think what I was proposing was simpler but perhaps there are gaps in that? Hey Joel--Here was a sketch of what I was proposing. I do think this get's rid of manual offset tracking, especially doing so across threads with dedicated commit threads, which I think is p

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-11 Thread Joel Koshy
Hi Jay, > The data channels are actually a big part of the complexity of the zero > data loss design, though, right? Because then you need some reverse channel > to flow the acks back to the consumer based on where you are versus just > acking what you have read and written (as in the code snippet

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-10 Thread Guozhang Wang
Being late on this thread... I think people are on the same page that: 1. We need the zero data-loss MM for sure. 2. The handler function itself is not adding much complexity to MM. 3. The current zero data-loss approach based on the data channel seems too complex. For 3), I agree with Joel that

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-10 Thread Jay Kreps
Hey Joel, The data channels are actually a big part of the complexity of the zero data loss design, though, right? Because then you need some reverse channel to flow the acks back to the consumer based on where you are versus just acking what you have read and written (as in the code snippet I put

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-10 Thread Joel Koshy
On Tue, Feb 10, 2015 at 12:13:46PM -0800, Neha Narkhede wrote: > I think all of us agree that we want to design MirrorMaker for 0 data loss. > With the absence of the data channel, 0 data loss will be much simpler to > implement. The data channel is irrelevant to the implementation of zero data

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-10 Thread Neha Narkhede
I think all of us agree that we want to design MirrorMaker for 0 data loss. With the absence of the data channel, 0 data loss will be much simpler to implement. My arguments for adding a message handler are that: > 1. It is more efficient to do something in common for all the clients in > pipeline

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-10 Thread Joel Koshy
I think the message handler adds little to no complexity to the mirror maker. Jay/Neha, the MM became scary due to the rearchitecture we did for 0.8 due to performance issues compared with 0.7 - we should remove the data channel if it can match the current throughput. I agree it is worth prototypin

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-09 Thread Jiangjie Qin
I just updated the KIP page and incorporated Jay and Neha’s suggestion. As a brief summary of where we are: Consensus reached: Have N independent mirror maker threads each has their own consumers but share a producer. The mirror maker threads will be responsible for decompression, compression and

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-08 Thread Jiangjie Qin
Hi Jay, thanks a lot for the comments. I think this solution is better. We probably don’t need data channel anymore. It can be replaced with a list of producer if we need more sender thread. I’ll update the KIP page. The reasoning about message handler is mainly for efficiency purpose. I’m thinkin

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-08 Thread Jiangjie Qin
Thanks for the feedback, Neha. Please see inline replies. ―Jiangjie (Becket) Qin On 2/8/15, 2:40 PM, "Neha Narkhede" wrote: >Few comments - > >1. Why do we need the message handler? Do you have concrete use cases in >mind? If not, we should consider adding it in the future when/if we do >have >

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-08 Thread Jay Kreps
Yeah, I second Neha's comments. The current mm code has taken something pretty simple and made it pretty scary with callbacks and wait/notify stuff. Do we believe this works? I can't tell by looking at it which is kind of bad for something important like this. I don't mean this as criticism, I know

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-08 Thread Neha Narkhede
Few comments - 1. Why do we need the message handler? Do you have concrete use cases in mind? If not, we should consider adding it in the future when/if we do have use cases for it. The purpose of the mirror maker is a simple tool for setting up Kafka cluster replicas. I don't see why we need to i

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-08 Thread Jiangjie Qin
Hi Neha, Yes, I’ve updated the KIP so the entire KIP is based on new consumer now. I’ve put both designs with and without data channel in the KIP as I still feel we might need the data channel to provide more flexibility, especially after message handler is introduced. I’ve put my thinking of the

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-08 Thread Neha Narkhede
Hey Becket, What are the next steps on this KIP. As per your comment earlier on the thread - I do agree it makes more sense > to avoid duplicate effort and plan based on new consumer. I’ll modify the > KIP. Did you get a chance to think about the simplified design that we proposed earlier? Do y

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-04 Thread Jiangjie Qin
In mirror maker we do not do de-serialization on the messages. Mirror maker use source TopicPartition hash to chose a producer to send messages from the same source partition. The partition those messages end up with are decided by Partitioner class in KafkaProducer (assuming you are using the new

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-04 Thread Bhavesh Mistry
Hi Jiangjie, Thanks for entertaining my question so far. Last question, I have is about serialization of message key. If the key de-serialization (Class) is not present at the MM instance, then does it use raw byte hashcode to determine the partition ? How are you going to address the situation

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-30 Thread Jiangjie Qin
Hi Bhavesh, Please see inline comments. Jiangjie (Becket) Qin On 1/29/15, 7:00 PM, "Bhavesh Mistry" wrote: >Hi Jiangjie, > >Thanks for the input. > >a) Is MM will producer ack will be attach to Producer Instance or per >topic. Use case is that one instance of MM >needs to handle both strong

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-29 Thread Bhavesh Mistry
Hi Jiangjie, Thanks for the input. a) Is MM will producer ack will be attach to Producer Instance or per topic. Use case is that one instance of MM needs to handle both strong ack and also ack=0 for some topic. Or it would be better to set-up another instance of MM. b) Regarding TCP connectio

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-29 Thread Jiangjie Qin
Hi Bhavesh, I think it is the right discussion to have when we are talking about the new new design for MM. Please see the inline comments. Jiangjie (Becket) Qin On 1/28/15, 10:48 PM, "Bhavesh Mistry" wrote: >Hi Jiangjie, > >I just wanted to let you know about our use case and stress the point

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-28 Thread Bhavesh Mistry
Hi Jiangjie, I just wanted to let you know about our use case and stress the point that local data center broker cluster have fewer partitions than the destination offline broker cluster. Just because we do the batch pull from CAMUS and in order to drain data faster than the injection rate (from f

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-28 Thread Jiangjie Qin
I’ve updated the KIP page. Feedbacks are welcome. Regarding the simple mirror maker design. I thought over it and have some worries: There are two things that might worth thinking: 1. One of the enhancement to mirror maker is adding a message handler to do things like reformatting. I think we migh

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-26 Thread Jiangjie Qin
Hi Jay and Neha, Thanks a lot for the reply and explanation. I do agree it makes more sense to avoid duplicate effort and plan based on new consumer. I’ll modify the KIP. To Jay’s question on message ordering - The data channel selection makes sure that the messages from the same source partition

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-25 Thread Neha Narkhede
I think there is some value in investigating if we can go back to the simple mirror maker design, as Jay points out. Here you have N threads, each has a consumer and a producer. The reason why we had to move away from that was a combination of the difference in throughput between the consumer and

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-23 Thread Jay Kreps
QQ: If we ever use a different technique for the data channel selection than for the producer partitioning won't that break ordering? How can we ensure these things stay in sync? With respect to the new consumer--I really do want to encourage people to think through how MM will work with the new c

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-23 Thread Joe Stein
I think that before we start making more changes to Mirror Maker there should be unit tests in place for it. Currently Mirror Maker is broker on trunk (there is a patch to fix it) because of a recent change. That is only going to get more unwieldily as more change happens. On Wed, Jan 21, 2015 at

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-21 Thread Jiangjie Qin
Hi Jay, Thanks for comments. Please see inline responses. Jiangjie (Becket) Qin On 1/21/15, 1:33 PM, "Jay Kreps" wrote: >Hey guys, > >A couple questions/comments: > >1. The callback and user-controlled commit offset functionality is already >in the new consumer which we are working on in paral

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-21 Thread Jay Kreps
Hey guys, A couple questions/comments: 1. The callback and user-controlled commit offset functionality is already in the new consumer which we are working on in parallel. If we accelerated that work it might help concentrate efforts. I admit this might take slightly longer in calendar time but co

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-21 Thread Jiangjie Qin
Currently it is a manual process. For functional test, I just setup two Kafka clusters locally, mirror between them and keep producing data to one of the cluster. Then try a hard kill / bounce mirror maker to see if messages are lost in target cluster. Jiangjie (Becket) Qin On 1/21/15, 12:24 PM,

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-21 Thread Gwen Shapira
Thanks for the answers. Much clearer now :) Unrelated question: How do you test MirrorMaker (especially around data loss)? I didn't see any unit-tests or integration tests in trunk. Gwen On Wed, Jan 21, 2015 at 9:55 AM, Jiangjie Qin wrote: > Hi Gwen, > > Please see inline answers. I¹ll update t

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-21 Thread Jiangjie Qin
Hi Gwen, Please see inline answers. I¹ll update them in the KIP as well. Thanks. Jiangjie (Becket) Qin On 1/20/15, 6:39 PM, "Gwen Shapira" wrote: >Thanks for the detailed document, Jiangjie. Super helpful. > >Few questions: > >1. You mention that "A ConsumerRebalanceListener class is created

Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-20 Thread Gwen Shapira
Thanks for the detailed document, Jiangjie. Super helpful. Few questions: 1. You mention that "A ConsumerRebalanceListener class is created and could be wired into ZookeeperConsumerConnector to avoid duplicate messages when consumer rebalance occurs in mirror maker." Is this something the user n

[KIP-DISCUSSION] Mirror Maker Enhancement

2015-01-20 Thread Jiangjie Qin
Hi Kafka Devs, We are working on Kafka Mirror Maker enhancement. A KIP is posted to document and discuss on the followings: 1. KAFKA-1650: No Data loss mirror maker change 2. KAFKA-1839: To allow partition aware mirror. 3. KAFKA-1840: To allow message filtering/format conversion Feedbacks are wel