I think the problem you are describing is that if a single broker is slow all producers will come to a halt (because they all talk to this broker).
We don't have a great solution for this at the moment. In our own usage for the first tier of data collection each producer connects to a single broker and sends all data there and if it dies the producer reconnects. This somewhat moderates the problem since if only 1 of n brokers is slow, only 1/nth the producers are impacted. This does not allow any semantic partitioning by key. You should be able to accomplish this with a custom partitioner that chooses a random partition and sticks with it instead of round-robining. A more sophisticated solution might detect slow brokers and shoot them in the head. If the detection works correctly and the underlying cause is some hardware problem or other process on the machine, then just killing the node would fix the problem. However if the problem is just load then this will probably make things worse. It is also a bit tricky to define what is "slow" and have the user accurately configure that. It would be easy to imagine a half-assed implementation causing more problems then it fixed. -Jay On Fri, Jan 4, 2013 at 12:24 AM, Raghu Angadi <ra...@angadi.org> wrote: > Producer distributes messages uniformly across the partitions. > > This does not work very well when some of the brokers are much slower than > others. Is there a way to temporarily avoid such slow brokers? > > While async producers, I could avoid producers that have lot more messages > in their internal queue compared to others (through my own Partitioners). > But the queue size is not available. tried to maintain my own estimate of > queue size using 'CallbackHandler', but API does not seem to provide enough > info (it provides partition id, but not broker id. plus, CallbackHandler > seems to be removed in 0.8). > > any suggestions? > > Kafka version : 0.7.1 > > thanks, > Raghu. >