>From my knowledge (beginner's) each partition still requires at least a file selector on the Kafka brokers. The new consumer structure means consumers won't store data in Zookeeper, but topics and partitions still do.
What I would do is key by your ID and place a rate limiting stream processor in front of your heavier processors. This could be a windowed task that counts how many messages have been sent in the last few seconds or minutes. For under-limit IDs send to a high priority topic. For over limit, a lower priority topic. Ben On Tuesday, May 3, 2016, David Shepherd <dtsheph...@gmail.com> wrote: > I was wonder if the new Kafka Consumer introduced in 0.9.0 ( > > http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client > ) > allows for a higher number of partitions in a given cluster since it > removes the zookeeper dependency. I understand the file descriptor and > availability concerns discussed here: > > http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ > . > > > The reason I ask is because we'd like to use partitioning to limit the > impact of a message flood on our downstream consumers. If we can partition > by a particular ID, it will isolate message floods from a given source into > a single partition, which allows us to allocate a single consume to process > that flood without affecting quality of service to the rest of the system. > Unfortunately, partitioning this way could create millions of partitions, > each only producing a few messages per minute with the exception that a few > of the partitions will be sending thousands of messages per minute. > > I'm also open to suggestions on how others have solved the flooding / noisy > neighbor problem in Kafka. > > Thanks, > Dave Shepherd > -- Benjamin Manns benma...@gmail.com (434) 321-8324