[ 
https://issues.apache.org/jira/browse/KAFKA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955764#comment-16955764
 ] 

Werner Daehn commented on KAFKA-8812:
-------------------------------------

@all 

As feared, this is a request which does not spark any attention. Simply because 
it is an internal optimization for Kafka and Kafka Connect.

Can anybody decide on if this is worth the effort or not?

 

My argumentation is like this:

If you use Kafka Connect, the library handles the rebalance if one worker node 
fails. How does it do that internally? For Data Sinks it is using Kafka 
Consumer rebalance, for Data Sources it is doing its own stuff because there is 
no support for rebalancing Kafka Producers.

As a Kafka Connect user this is fine but it would not harm having Consumer and 
Producer rebalance in the Kafka server.

For people using the pure Kafka APIs, they would need to reinvent the producer 
balancing again.

Hence the request to move that logic.

 

Thanks in advance

> Rebalance Producers
> -------------------
>
>                 Key: KAFKA-8812
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8812
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.3.0
>            Reporter: Werner Daehn
>            Assignee: Werner Daehn
>            Priority: Major
>              Labels: kip
>
> [KIP-509: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-509%3A+Rebalance+and+restart+Producers|https://cwiki.apache.org/confluence/display/KAFKA/KIP-509%3A+Rebalance+and+restart+Producers]
> Please bare with me. Initially this thought sounds stupid but it has its 
> merits.
>  
> How do you build a distributed producer at the moment? You use Kafka Connect 
> which in turn requires a cluster that tells which instance is producing what 
> partitions.
> On the consumer side it is different. There Kafka itself does the data 
> distribution. If you have 10 Kafka partitions and 10 consumers, each will get 
> data for one partition. With 5 consumers, each will get data from two 
> partitions. And if there is only a single consumer active, it gets all data. 
> All is managed by Kafka, all you have to do is start as many consumers as you 
> want.
>  
> I'd like to suggest something similar for the producers. A producer would 
> tell Kafka that its source has 10 partitions. The Kafka server then responds 
> with a list of partitions this instance shall be responsible for. If it is 
> the only producer, the response would be all 10 partitions. If it is the 
> second instance starting up, the first instance would get the information it 
> should produce data for partition 1-5 and the new one for partition 6-10. If 
> the producer fails to respond with an alive packet, a rebalance does happen, 
> informing the active producer to take more load and the dead producer will 
> get an error when sending data again.
> For restart, the producer rebalance has to send the starting point where to 
> start producing the data onwards from as well, of course. Would be best if 
> this is a user generated pointer and not the topic offset. Then it can be 
> e.g. the database system change number, a database transaction id or 
> something similar.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to