Would I be correct in assuming that a Kafka cluster won't scale well to
support lots (tens of millions) of topics? If I understand correctly, a
node being added or removed would involve a leader election for each topic,
which is a relatively expensive operation?
At those many topics, zookeeper will be the main bottleneck. Leader
election process will take very long increasing the unavailability window
of the cluster.
Thanks,
Neha
On Nov 13, 2013 4:49 AM, "Joe Freeman" wrote:
> Would I be correct in assuming that a Kafka cluster won't scale well to
> sup
I didn't see any auto leader election for adding nodes. The data are still
skewed on the old nodes. You have to force it by running script?
On Wed, Nov 13, 2013 at 6:41 AM, Neha Narkhede wrote:
> At those many topics, zookeeper will be the main bottleneck. Leader
> election process will take ver
Since you have a cluster, why not distribute the consumers in different
nodes instead of threads. I think that's the only way to scale up with
kafka.
Question here: if there are more and more high-level consumers, is there a
bottleneck on the zookeeper?
On Tue, Nov 12, 2013 at 9:27 PM, Jun Rao w
Thanks!
On Tue, Nov 12, 2013 at 12:29 PM, Joe Stein wrote:
> Hi Siyuan, we have a fix for this in 0.8.0 which is in progress being
> released.
>
> If you find the pom breaking for any reason (some build systems have
> problems with the bad pom) you can use the direct apache repository
> https:/
I'm working on some fault-tolerant consumer group. The idea is this, to
maximize the throughput of kafka. I request the metadata from broker and
create #{num of partition} consumers for each topic and distribute them on
different nodes. Moreover, there is mechanism to detect fail of any node
and re
If this is reproducible and you have logs that would help; in short
though, yes if you start up the replacement instance before the old
consumer instance's session is actually expired by zookeeper you could
run into rebalance exceptions (in which case you should see conflicts
in your consumer logs)
On Wed, Nov 13, 2013 at 11:54:07AM -0800, hsy...@gmail.com wrote:
> Since you have a cluster, why not distribute the consumers in different
> nodes instead of threads. I think that's the only way to scale up with
> kafka.
Depending on your CPU-specs you should be able to add threads to scale
out -
A node being added will generally not lead to any leader election. A node
being removed will lead to leader election. You can force leader election
using the preferred replica election command -
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.PreferredReplicaL
Zookeeper will not be the only problem. The first is that each topic is a
directory on the file system. Each of those is going to have files inside
it. This is going to be fairly overwhelming for the file system. Also I can
not speak for the internals but there may be cases where this many topics
a
10 matches
Mail list logo