Thanks Ben, that's what I thought and I believe your suggestion is essentially what I planned to implement. We have a single topic with raw messages that will be partitioned randomly on ingest (just for scalability). I planned to install a consumer group router that reads from this "raw" topic and routes messages to "normal" or "throttled" topics. Both of these topics would be partitioned by the ID since I need the guarantee of a single consumer processing messages for a given ID. Routing would be very fast, while processing each message is much slower.
Know of any existing rate-based message routers between Kafka topics? -Dave On Tue, May 3, 2016 at 11:42 PM Benjamin Manns <benma...@gmail.com> wrote: > From my knowledge (beginner's) each partition still requires at least a > file selector on the Kafka brokers. The new consumer structure means > consumers won't store data in Zookeeper, but topics and partitions still > do. > > What I would do is key by your ID and place a rate limiting stream > processor in front of your heavier processors. This could be a windowed > task that counts how many messages have been sent in the last few seconds > or minutes. For under-limit IDs send to a high priority topic. For over > limit, a lower priority topic. > > > Ben > > On Tuesday, May 3, 2016, David Shepherd <dtsheph...@gmail.com> wrote: > > > I was wonder if the new Kafka Consumer introduced in 0.9.0 ( > > > > > http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client > > ) > > allows for a higher number of partitions in a given cluster since it > > removes the zookeeper dependency. I understand the file descriptor and > > availability concerns discussed here: > > > > > http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ > > . > > > > > > The reason I ask is because we'd like to use partitioning to limit the > > impact of a message flood on our downstream consumers. If we can > partition > > by a particular ID, it will isolate message floods from a given source > into > > a single partition, which allows us to allocate a single consume to > process > > that flood without affecting quality of service to the rest of the > system. > > Unfortunately, partitioning this way could create millions of partitions, > > each only producing a few messages per minute with the exception that a > few > > of the partitions will be sending thousands of messages per minute. > > > > I'm also open to suggestions on how others have solved the flooding / > noisy > > neighbor problem in Kafka. > > > > Thanks, > > Dave Shepherd > > > > > -- > Benjamin Manns > benma...@gmail.com > (434) 321-8324 >