5, Kartik Paramasivam
>> >> wrote:
>> >>>
>> >>> Isn't that what becket is also saying ?
>> >>>
>> >>> On Aug 28, 2015, at 10:12 PM, Joel Koshy wrote:
>> >>>
>> >>> I thought we were going
Ewen,
To be more clear about the numbers in LinkedIn including the expected
growth, we have the following table (Assuming avg topic name length is 27
bytes):
#TopicMirrored #MirrorMakerNodes #ConsumerPerNode
#BytesPerRebalance #BytesPerMirrorMakerRollingBounce
100
Ewen,
1. My concern is for the long subscription set that will be included in the
protocol metadata. Because we are using selective copy with a long
subscription set. 100 topics with 27 avg name length will give a 2.7K for
topic name and 0.4K for the partition number bytes. This gives 3.1K for
eac
@Onur - I think you reviewed the KAFKA-2464 patch in some detail. wrt the
specific problem you raised about using largest # of partitions, you're
right, with the updated proposal this isn't possible because we only send a
hash of the metadata in order to avoid the overhead of per-member metadata
be
Hi Neha,
Following are some numbers we have in the pipeline. It would be very
helpful to see how it goes with the proposed protocol. We will try to do
some tests with the current patch as well. Please also let us know if you
want further information.
32 brokers, 1Gbps NIC
547 topics
27 chars aver
Yeah I think I brought this up earlier, but I think delete/recreate with
respect to clients will never be right until we introduce the concept of
versioned topics.
On Fri, Aug 28, 2015 at 1:24 PM, Joel Koshy wrote:
> Another use-case I was thinking of was something like rack-aware
> assignment o
Another use-case I was thinking of was something like rack-aware
assignment of partitions to clients. This would require some
additional topic metadata to be propagated to and from the coordinator
and you would need some way to resolve conflicts for such strategies.
I think that could be addressed
>From what I understand, the "largest number of partitions" trick is based
on the assumption that topics can only expand their partitions. What
happens when a topic gets deleted and recreated? This breaks that
assumption.
On Fri, Aug 28, 2015 at 6:33 AM, Neha Narkhede wrote:
> Thanks for re-revi
Thanks for re-reviewing Joel.
On Fri, Aug 28, 2015 at 2:51 AM -0700, "Joel Koshy" wrote:
> I think we think this proposal addresses 100% of the split brain issues
> ever seen in the ZK-based protocol, but I think you think there are still
> issues. Can you explain what your thinki
> I think we think this proposal addresses 100% of the split brain issues
> ever seen in the ZK-based protocol, but I think you think there are still
> issues. Can you explain what your thinking of and when you think it would
> happen? I want to make sure you aren't assuming client-side=>split-brai
Hey Joel,
I really don't think we should do both. There are pros and cons but we
should make a decision and work on operationalizing one approach. Much of
really making something like this work is getting all the bugs out, getting
monitoring in place, getting rigorous system tests in place. Trying
I actually feel these set of tests (whatever they may be) are somewhat
irrelevant here. My main concern with the current client-side proposal
(i.e., without Becket's follow-up suggestions) is that it makes a
significant compromise to the original charter of the new consumer -
i.e., reduce/eliminate
Hey Becket,
In that case, the broker side partition assignment would be ideal because
> it avoids
> issues like metadata inconsistency / split brain / exploding subscription
> set propagation.
As per our previous discussions regarding each of those concerns (referring
to this email thread, KIP c
Hi folks,
After further discussion in LinkedIn, we found that while having a more
general group management protocol is very useful, the vast majority of the
clients will not use customized partition assignment strategy. In that
case, the broker side partition assignment would be ideal because it a
Follow-up from the kip call:
1. Onur brought up the question of whether this protocol provides enough
coordination capabilities to be generally useful in practice (is that
accurate, Onur?). If it doesn't, then each use case would probably need a
dependence on zookeeper anyway, and we haven't reall
Jun,
Yes, I agree. If the metadata can be synced quickly there should not be an
issue. It just occurred to me that there is a proposal to allow consuming
from followers in ISR, that could potentially cause more frequent metadata
change for consumers. Would that be an issue?
Thanks,
Jiangjie (Bec
Hi Jun,
Answers below:
1. When there are multiple common protocols in the JoinGroupRequest, which
one would the coordinator pick?
I was intending to use the list to indicate preference. If all group
members support protocols ["A", "B"] in that order, then we will choose
"A." If some support ["B"
Jason,
Thanks for the writeup. A few comments below.
1. When there are multiple common protocols in the JoinGroupRequest, which
one would the coordinator pick?
2. If the protocols don't agree, the group construction fails. What exactly
does it mean? Do we send an error in every JoinGroupResponse
Ewen,
Honestly I am not sure whether we can keep all the topic in the same
cluster or not. By default, we will mirror the topic. Because when
application teams run their test, they would expect the environment to be
as similar to production as possible. Also, we are developing a
comprehensive cont
Becket,
Just to clarify, when topics are being created/resized for automated tests,
are those excluded by mirrormaker? I assume you don't want to bother with
copying them since they are just for tests. If they are excluded, then they
don't need to affect the metadata used by the consumer and so wo
Hey Becket,
Thanks for raising these concerns. I think the rolling upgrade problem is
handled quite a bit better with the new consumer if Onur's LeaveGroup patch
(or a variant) is accepted. Without it, the coordinator has to wait for the
full session timeout to detect that a node has left the grou
Hey Becket,
These are all fair points. Regarding running Kafka as a service, it will be
good for everyone to know some numbers around topic creation and changes
around # of partitions. I don't think the test usage is a good one since no
one should be creating and deleting topics in a loop on a pro
Neha, Ewen and Jason,
Maybe I am over concerning and I agree that it does depend on the metadata
change frequency. As Neha said, a few tests will be helpful. We can see how
it goes.
What worries me is that in LinkedIn we are in the progress of running Kafka
as a service. That means user will have
Becket,
This is a clever approach for to ensure that only one thing communicates
the metadata so even if it is stale, the entire group has the same view.
However, the big assumption this makes is that the coordinator is that one
process that has the ability to know the metadata for group members,
Agreed that talking about actual numbers would be helpful here.
Only 2 things affect this metadata: list of topics and number of
partitions. I don't think either of those can change *that* frequently to
be relevant to our 3s default heartbeat and 30s default session timeout, or
even the 5 minute m
Hey Jiangjie,
I was thinking about the same problem. When metadata is changing
frequently, the clients may not be able to ever find agreement on the
current state. The server doesn't have this problem, as you say, because it
can just take a snapshot and send that to the clients. Adding a dampening
Ewen,
I agree that if there is a churn in metadata, the consumers need several
rounds of rebalances to succeed. The difference I am thinking is that with
coordinator as single source of truth, we can let the consumer finish one
round of rebalance, work for a while and start the next round of rebal
Jason,
The way this is handled today (in the old consumer) is by grouping several
rebalance events into one rebalance operation (similar to how we handle log
flushes using "group commit"). The idea is to batch several rebalance
events while one rebalance operation is in progress.
This is naturall
I think metadata churn is an interesting problem and it would be nice if we
had some approach to deal with it. I wonder if it would be sufficient to
have a setting on the client (or the server if we use centralized
assignment) which dampens the rate of rebalancing. Basically the effect
would be to
>
> So if you're just subscribing to one or a couple of topics, why not just
> compute the hash by filtering out everything but the topics you are
> subscribed to? The problem there is if you ever add/remove subscriptions
> and want to support rolling upgrades. If the group was subscribed to topic
On Fri, Aug 14, 2015 at 10:59 AM, Jiangjie Qin
wrote:
> Neha and Ewen,
>
> About the metadata change frequency. I guess it really depends on how
> frequent the metadata change might occur. If we run Kafka as a service, I
> can see that happens from time to time. As I can imagine people will creat
Neha and Ewen,
About the metadata change frequency. I guess it really depends on how
frequent the metadata change might occur. If we run Kafka as a service, I
can see that happens from time to time. As I can imagine people will create
some topic, test and maybe delete the topic in some automated t
Ewen, I have no problem with adding an explicit version. As you mentioned,
I was thinking that each protocol would manage that in its name field, but
the explicit field is probably clearer for implementors.
Becket, I was also a little concerned that the consumers would have some
difficulty agreein
On Thu, Aug 13, 2015 at 11:07 PM, Neha Narkhede wrote:
> Becket,
>
> As you say, the metadata hash addresses the concern you originally raised
> about large topic subscriptions. Can you please list other problems you are
> raising more clearly? It is more helpful to know problems that the proposa
Becket,
As you say, the metadata hash addresses the concern you originally raised
about large topic subscriptions. Can you please list other problems you are
raising more clearly? It is more helpful to know problems that the proposal
does not address or addresses poorly.
Regarding other things yo
Ewen/Jason,
The metadata hash is a clever approach and certainly addresses the problem
of large metadata for consumers like mirror maker. Few comments -
1. In the interest of simplifying the format of the consumer's metadata
- Why not just always include only the topic names in the metadat
Ewen and Jason,
Thanks for the reply. Sorry I missed the metadata hash. Yes, that is a
clever approach and would solve my concern about the data passing around. I
can see both pros and cons from doing this, though. The advantage is we
don't need the topic metadata in JoinGroupResponse anymore. The
Thanks Jiangjie, that information helps. I agree the protocol must consider
scalability. My point was that the synchronization barrier in the current
protocol already effectively limits the number of consumers since it
provides no way to gracefully handle churn. It wouldn't be worth worrying
about
Becket,
Did you see the updates to the "ConsumerEmbeddedProtocol" section,
specifically on how regex subscriptions like you're describing are handled?
The updated version removes almost all the metadata in the regex case by
taking advantage of exactly the point you made in (2). The clients don't
a
Jason,
The protocol has to consider the scalability. The protocol in the wiki
means the JoinGroupResoponse size would be:
NumberOfTopics * (AvgTopicNameLength + 4) * (NumberOfConsumers)^2
To give some real number, we have 26-node Mirror Maker cluster, each with 4
consumers. That is 104 consumers
Hey Onur and Jiangjie,
I've updated that wiki with a proposal to add regex subscriptions to the
consumer metadata. Can you have a look to see if it addresses your concern?
In general, I think we should be a little careful when we are talking about
the scalability of the protocol. Regardless of whe
Hey Ewen,
Onur and I discussed this a little bit more. And we are still worrying
about passing all the metadata of all consumers around.
Let's say I have a cluster has 10,000 topics, the average topic name length
is 10 bytes. In this case, the opaque metadata will have 10 * 10,000 =
100KB for top
On Tue, Aug 11, 2015 at 11:29 PM, Jiangjie Qin
wrote:
> Ewen,
>
> Thanks for the explanation.
>
> For (1), I am more concerned about the failure case instead of normal case.
> What if a consumer somehow was kick out of a group but is still consuming
> and committing offsets? Does that mean the ne
Ewen,
Thanks for the explanation.
For (1), I am more concerned about the failure case instead of normal case.
What if a consumer somehow was kick out of a group but is still consuming
and committing offsets? Does that mean the new owner and old owner might
potentially consuming from and committin
On Tue, Aug 11, 2015 at 10:15 PM, Jiangjie Qin
wrote:
> Hi Jason,
>
> Thanks for writing this up. It would be useful to generalize the group
> concept. I have a few questions below.
>
> 1. In old consumer actually the partition assignment are done by consumers
> themselves. We used zookeeper to g
On Tue, Aug 11, 2015 at 10:03 PM, Onur Karaman wrote:
> Just to make the conversation a bit easier (I don't think we have really
> established names for these modes yet), basically with the new
> KafkaConsumer today there's:
> - "external management", where the application figures out the group
>
Hi Jason,
Thanks for writing this up. It would be useful to generalize the group
concept. I have a few questions below.
1. In old consumer actually the partition assignment are done by consumers
themselves. We used zookeeper to guarantee that a partition will only be
consumed by one consumer thre
Just to make the conversation a bit easier (I don't think we have really
established names for these modes yet), basically with the new
KafkaConsumer today there's:
- "external management", where the application figures out the group
management and partition assignment externally
- "kafka managemen
Hi Kafka Devs,
One of the nagging issues in the current design of the new consumer has
been the need to support a variety of assignment strategies. We've
encountered this in particular in the design of copycat and the processing
framework (KIP-28). From what I understand, Samza also has a number o
49 matches
Mail list logo