Kafka cluster with lots of topics

2013-11-13 Thread Joe Freeman
Would I be correct in assuming that a Kafka cluster won't scale well to support lots (tens of millions) of topics? If I understand correctly, a node being added or removed would involve a leader election for each topic, which is a relatively expensive operation?

Re: Kafka cluster with lots of topics

2013-11-13 Thread Neha Narkhede
At those many topics, zookeeper will be the main bottleneck. Leader election process will take very long increasing the unavailability window of the cluster. Thanks, Neha On Nov 13, 2013 4:49 AM, "Joe Freeman" wrote: > Would I be correct in assuming that a Kafka cluster won't scale well to > sup

Re: Kafka cluster with lots of topics

2013-11-13 Thread hsy...@gmail.com
I didn't see any auto leader election for adding nodes. The data are still skewed on the old nodes. You have to force it by running script? On Wed, Nov 13, 2013 at 6:41 AM, Neha Narkhede wrote: > At those many topics, zookeeper will be the main bottleneck. Leader > election process will take ver

Re: High level consumer Blocked when there is still message in topic

2013-11-13 Thread hsy...@gmail.com
Since you have a cluster, why not distribute the consumers in different nodes instead of threads. I think that's the only way to scale up with kafka. Question here: if there are more and more high-level consumers, is there a bottleneck on the zookeeper? On Tue, Nov 12, 2013 at 9:27 PM, Jun Rao w

Re: pom warning

2013-11-13 Thread hsy...@gmail.com
Thanks! On Tue, Nov 12, 2013 at 12:29 PM, Joe Stein wrote: > Hi Siyuan, we have a fix for this in 0.8.0 which is in progress being > released. > > If you find the pom breaking for any reason (some build systems have > problems with the bad pom) you can use the direct apache repository > https:/

A problem of fault-tolerant high-level consumer group

2013-11-13 Thread hsy...@gmail.com
I'm working on some fault-tolerant consumer group. The idea is this, to maximize the throughput of kafka. I request the metadata from broker and create #{num of partition} consumers for each topic and distribute them on different nodes. Moreover, there is mechanism to detect fail of any node and re

Re: A problem of fault-tolerant high-level consumer group

2013-11-13 Thread Joel Koshy
If this is reproducible and you have logs that would help; in short though, yes if you start up the replacement instance before the old consumer instance's session is actually expired by zookeeper you could run into rebalance exceptions (in which case you should see conflicts in your consumer logs)

Re: High level consumer Blocked when there is still message in topic

2013-11-13 Thread Joel Koshy
On Wed, Nov 13, 2013 at 11:54:07AM -0800, hsy...@gmail.com wrote: > Since you have a cluster, why not distribute the consumers in different > nodes instead of threads. I think that's the only way to scale up with > kafka. Depending on your CPU-specs you should be able to add threads to scale out -

Re: Kafka cluster with lots of topics

2013-11-13 Thread Neha Narkhede
A node being added will generally not lead to any leader election. A node being removed will lead to leader election. You can force leader election using the preferred replica election command - https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.PreferredReplicaL

Re: Kafka cluster with lots of topics

2013-11-13 Thread Edward Capriolo
Zookeeper will not be the only problem. The first is that each topic is a directory on the file system. Each of those is going to have files inside it. This is going to be fairly overwhelming for the file system. Also I can not speak for the internals but there may be cases where this many topics a