Make sense to me.
On Fri, Feb 27, 2015 at 12:59 PM, Jiangjie Qin
wrote:
> I think it probably needs another KIP to discuss the command line tool
> standardization because it is essentially a cross boad user interface
> change.
> For this specific KIP, I believe the scope is just to make sure we f
The biggest gap in tool standardization and MirrorMaker is the fact
that MirrorMaker takes 2 configuration files as inputs with required
parameters that can't be set on command line.
If we are breaking back-compatibility, perhaps we can standardize this part too?
On Fri, Feb 27, 2015 at 12:36 PM,
Yeah it will break the existing usage but personally I think it is worth it
to be standard across all our tools.
-Jay
On Fri, Feb 27, 2015 at 9:53 AM, Jiangjie Qin
wrote:
> Hi Jay,
>
> I just modified the KIP. The only concern I have about this change is that
> it will break existing deployment
Hi Jay,
I just modified the KIP. The only concern I have about this change is that
it will break existing deployments. And we need to change the command line
arguments format for other tools as well. It is defiitely better that we
conform to the unix standard. It is just I am not sure if the chang
Can we change the command line arguments for mm to match the command line
arguments elsewhere. This proposal seems to have two formats:
*--consumer.rebalance.listener*
and
*--abortOnSendFail*
The '.' separators for command line options predate this JIRA but I think
the new camelCase option is a new
KAFKA-1650 added this option, but we don’t have ―no.data.loss in any
official release.
On 2/26/15, 12:01 PM, "Gwen Shapira" wrote:
>Did --no.data.loss exist in previous releases of irrorMaker?
>If it does, maybe we want to keep it around for backward compatibility?
>(i.e. so existing dployments
Did --no.data.loss exist in previous releases of MirrorMaker?
If it does, maybe we want to keep it around for backward compatibility?
(i.e. so existing deployments won't break)?
Gwen
On Thu, Feb 26, 2015 at 11:57 AM, Jiangjie Qin
wrote:
> Hi Neha,
>
> Thanks for the comment. That’s a really goo
Hi Neha,
Thanks for the comment. That’s a really good point.
Originally I’m thinking about allowing user to tweak some parameter as
needed.
For example, some user might want to have pipeline enabled and can
tolerate reordering, some user might want to use acks=1 or acks=0, some
might want to move
Hey Becket,
The KIP proposes addition of a --no.data.loss command line option to the
MirrorMaker. Though when would the user not want that option? I'm wondering
what the benefit of providing that option is if every user would want that
for correct mirroring behavior.
Other than that, the KIP look
For 1), the current design allow you to do it. The customizable message
handler takes in a ConsumerRecord and spit a List, you can
just put a topic for the ProducerRecord different from ConsumerRecord.
WRT performance, we did some test in LinkedIn, the performance looks good
to us.
Jiangjie (Beck
Hi Jiangjie,
It might be too late. But, I wanted to bring-up following use case for
adopting new MM:
1) Ability to publish message from src topic to different destination topic
via --overidenTopics=srcTopic:newDestinationTopic
In order to adopt, new MM enhancement customer will compare performa
Sure! Just created the voting thread :)
On 2/24/15, 4:44 PM, "Jay Kreps" wrote:
>Hey Jiangjie,
>
>Let's do an official vote so that we know what we are voting on and we are
>crisp on what the outcome was. This thread is very long :-
>
>-Jay
>
>On Tue, Feb 24, 2015 at 2:53 PM, Jiangjie Qin
>wrot
Hey Jiangjie,
Let's do an official vote so that we know what we are voting on and we are
crisp on what the outcome was. This thread is very long :-)
-Jay
On Tue, Feb 24, 2015 at 2:53 PM, Jiangjie Qin
wrote:
> I updated the KIP page based on the discussion we had.
>
> Should I launch another vo
I forgot to mention that we have tested the entire design in our test
environment. The performance looks fine.
Jiangjie (Becket) Qin
On 2/24/15, 2:53 PM, "Jiangjie Qin" wrote:
>I updated the KIP page based on the discussion we had.
>
>Should I launch another vote or we can think of this mail th
I updated the KIP page based on the discussion we had.
Should I launch another vote or we can think of this mail thread has
already included a vote?
Jiangjie (Becket) Qin
On 2/11/15, 5:15 PM, "Neha Narkhede" wrote:
>Thanks for the explanation, Joel! Would love to see the results of the
>throug
Thanks for the explanation, Joel! Would love to see the results of the
throughput experiment and I'm a +1 on everything else, including the
rebalance callback and record handler.
-Neha
On Wed, Feb 11, 2015 at 1:13 PM, Jay Kreps wrote:
> Cool, I agree with all that.
>
> I agree about the need fo
Cool, I agree with all that.
I agree about the need for a rebalancing callback.
Totally agree about record handler.
It would be great to see if a prototype of this is workable.
Thanks guys!
-Jay
On Wed, Feb 11, 2015 at 12:36 PM, Joel Koshy wrote:
> Hey Jay,
>
> Guozhang, Becket and I got to
Hey Jay,
Guozhang, Becket and I got together to discuss this and we think:
- It seems that your proposal based on the new consumer and flush call
should work.
- We would likely need to call the poll with a timeout that matches
the offset commit interval in order to deal with low volume
mirr
Guozhang, I agree with 1-3, I do think what I was proposing was simpler but
perhaps there are gaps in that?
Hey Joel--Here was a sketch of what I was proposing. I do think this get's
rid of manual offset tracking, especially doing so across threads with
dedicated commit threads, which I think is p
Hi Jay,
> The data channels are actually a big part of the complexity of the zero
> data loss design, though, right? Because then you need some reverse channel
> to flow the acks back to the consumer based on where you are versus just
> acking what you have read and written (as in the code snippet
Being late on this thread...
I think people are on the same page that:
1. We need the zero data-loss MM for sure.
2. The handler function itself is not adding much complexity to MM.
3. The current zero data-loss approach based on the data channel seems too
complex.
For 3), I agree with Joel that
Hey Joel,
The data channels are actually a big part of the complexity of the zero
data loss design, though, right? Because then you need some reverse channel
to flow the acks back to the consumer based on where you are versus just
acking what you have read and written (as in the code snippet I put
On Tue, Feb 10, 2015 at 12:13:46PM -0800, Neha Narkhede wrote:
> I think all of us agree that we want to design MirrorMaker for 0 data loss.
> With the absence of the data channel, 0 data loss will be much simpler to
> implement.
The data channel is irrelevant to the implementation of zero data
I think all of us agree that we want to design MirrorMaker for 0 data loss.
With the absence of the data channel, 0 data loss will be much simpler to
implement.
My arguments for adding a message handler are that:
> 1. It is more efficient to do something in common for all the clients in
> pipeline
I think the message handler adds little to no complexity to the mirror
maker. Jay/Neha, the MM became scary due to the rearchitecture we did
for 0.8 due to performance issues compared with 0.7 - we should remove
the data channel if it can match the current throughput. I agree it is
worth prototypin
I just updated the KIP page and incorporated Jay and Neha’s suggestion. As
a brief summary of where we are:
Consensus reached:
Have N independent mirror maker threads each has their own consumers but
share a producer. The mirror maker threads will be responsible for
decompression, compression and
Hi Jay, thanks a lot for the comments.
I think this solution is better. We probably don’t need data channel
anymore. It can be replaced with a list of producer if we need more sender
thread.
I’ll update the KIP page.
The reasoning about message handler is mainly for efficiency purpose. I’m
thinkin
Thanks for the feedback, Neha. Please see inline replies.
―Jiangjie (Becket) Qin
On 2/8/15, 2:40 PM, "Neha Narkhede" wrote:
>Few comments -
>
>1. Why do we need the message handler? Do you have concrete use cases in
>mind? If not, we should consider adding it in the future when/if we do
>have
>
Yeah, I second Neha's comments. The current mm code has taken something
pretty simple and made it pretty scary with callbacks and wait/notify
stuff. Do we believe this works? I can't tell by looking at it which is
kind of bad for something important like this. I don't mean this as
criticism, I know
Few comments -
1. Why do we need the message handler? Do you have concrete use cases in
mind? If not, we should consider adding it in the future when/if we do have
use cases for it. The purpose of the mirror maker is a simple tool for
setting up Kafka cluster replicas. I don't see why we need to i
Hi Neha,
Yes, I’ve updated the KIP so the entire KIP is based on new consumer now.
I’ve put both designs with and without data channel in the KIP as I still
feel we might need the data channel to provide more flexibility,
especially after message handler is introduced. I’ve put my thinking of
the
Hey Becket,
What are the next steps on this KIP. As per your comment earlier on the
thread -
I do agree it makes more sense
> to avoid duplicate effort and plan based on new consumer. I’ll modify the
> KIP.
Did you get a chance to think about the simplified design that we proposed
earlier? Do y
In mirror maker we do not do de-serialization on the messages. Mirror
maker use source TopicPartition hash to chose a producer to send messages
from the same source partition. The partition those messages end up with
are decided by Partitioner class in KafkaProducer (assuming you are using
the new
Hi Jiangjie,
Thanks for entertaining my question so far. Last question, I have is about
serialization of message key. If the key de-serialization (Class) is not
present at the MM instance, then does it use raw byte hashcode to determine
the partition ? How are you going to address the situation
Hi Bhavesh,
Please see inline comments.
Jiangjie (Becket) Qin
On 1/29/15, 7:00 PM, "Bhavesh Mistry" wrote:
>Hi Jiangjie,
>
>Thanks for the input.
>
>a) Is MM will producer ack will be attach to Producer Instance or per
>topic. Use case is that one instance of MM
>needs to handle both strong
Hi Jiangjie,
Thanks for the input.
a) Is MM will producer ack will be attach to Producer Instance or per
topic. Use case is that one instance of MM
needs to handle both strong ack and also ack=0 for some topic. Or it would
be better to set-up another instance of MM.
b) Regarding TCP connectio
Hi Bhavesh,
I think it is the right discussion to have when we are talking about the
new new design for MM.
Please see the inline comments.
Jiangjie (Becket) Qin
On 1/28/15, 10:48 PM, "Bhavesh Mistry" wrote:
>Hi Jiangjie,
>
>I just wanted to let you know about our use case and stress the point
Hi Jiangjie,
I just wanted to let you know about our use case and stress the point that
local data center broker cluster have fewer partitions than the destination
offline broker cluster. Just because we do the batch pull from CAMUS and in
order to drain data faster than the injection rate (from f
I’ve updated the KIP page. Feedbacks are welcome.
Regarding the simple mirror maker design. I thought over it and have some
worries:
There are two things that might worth thinking:
1. One of the enhancement to mirror maker is adding a message handler to
do things like reformatting. I think we migh
Hi Jay and Neha,
Thanks a lot for the reply and explanation. I do agree it makes more sense
to avoid duplicate effort and plan based on new consumer. I’ll modify the
KIP.
To Jay’s question on message ordering - The data channel selection makes
sure that the messages from the same source partition
I think there is some value in investigating if we can go back to the
simple mirror maker design, as Jay points out. Here you have N threads,
each has a consumer and a producer.
The reason why we had to move away from that was a combination of the
difference in throughput between the consumer and
QQ: If we ever use a different technique for the data channel selection
than for the producer partitioning won't that break ordering? How can we
ensure these things stay in sync?
With respect to the new consumer--I really do want to encourage people to
think through how MM will work with the new c
I think that before we start making more changes to Mirror Maker there
should be unit tests in place for it.
Currently Mirror Maker is broker on trunk (there is a patch to fix it)
because of a recent change. That is only going to get more unwieldily as
more change happens.
On Wed, Jan 21, 2015 at
Hi Jay,
Thanks for comments. Please see inline responses.
Jiangjie (Becket) Qin
On 1/21/15, 1:33 PM, "Jay Kreps" wrote:
>Hey guys,
>
>A couple questions/comments:
>
>1. The callback and user-controlled commit offset functionality is already
>in the new consumer which we are working on in paral
Hey guys,
A couple questions/comments:
1. The callback and user-controlled commit offset functionality is already
in the new consumer which we are working on in parallel. If we accelerated
that work it might help concentrate efforts. I admit this might take
slightly longer in calendar time but co
Currently it is a manual process. For functional test, I just setup two
Kafka clusters locally, mirror between them and keep producing data to one
of the cluster. Then try a hard kill / bounce mirror maker to see if
messages are lost in target cluster.
Jiangjie (Becket) Qin
On 1/21/15, 12:24 PM,
Thanks for the answers. Much clearer now :)
Unrelated question: How do you test MirrorMaker (especially around data loss)?
I didn't see any unit-tests or integration tests in trunk.
Gwen
On Wed, Jan 21, 2015 at 9:55 AM, Jiangjie Qin wrote:
> Hi Gwen,
>
> Please see inline answers. I¹ll update t
Hi Gwen,
Please see inline answers. I¹ll update them in the KIP as well.
Thanks.
Jiangjie (Becket) Qin
On 1/20/15, 6:39 PM, "Gwen Shapira" wrote:
>Thanks for the detailed document, Jiangjie. Super helpful.
>
>Few questions:
>
>1. You mention that "A ConsumerRebalanceListener class is created
Thanks for the detailed document, Jiangjie. Super helpful.
Few questions:
1. You mention that "A ConsumerRebalanceListener class is created and
could be wired into ZookeeperConsumerConnector to avoid duplicate
messages when consumer rebalance occurs in mirror maker."
Is this something the user n
Hi Kafka Devs,
We are working on Kafka Mirror Maker enhancement. A KIP is posted to document
and discuss on the followings:
1. KAFKA-1650: No Data loss mirror maker change
2. KAFKA-1839: To allow partition aware mirror.
3. KAFKA-1840: To allow message filtering/format conversion
Feedbacks are wel
50 matches
Mail list logo