[ 
https://issues.apache.org/jira/browse/KAFKA-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105613#comment-17105613
 ] 

Guozhang Wang commented on KAFKA-9953:
--------------------------------------

Thanks [~jwijgerd]. Just to clarify I'm not questioning the value of adding 
this support, and I think it is good to have such feature. I'm just curious to 
learn what's your motivation for having a many to one consumer -> producer 
mapping :)

My understanding is that there seems to be two cases that requires more 
consumer groups: a) you have different formatted data representing different 
components and hence needs different deserialization, and b) for a single 
formatted data, it may need to be filtering based on certain fields to be used 
for.

Here are some wild ideas on top of my head that you may consider. Of course, 
all of those are based on an assumption that with partitioned topics we may not 
saturate the network throughput before we overwhelm the broker yet:

For a), it is okay to have a single consumer fetching from different topics 
since the serde's API can take a topic name as parameter so that you can 
specify different deser logic based on that, so you may have less than 
necessary consumer groups here.

For b), there's an additional overhead beyond the number of consumers that you 
would send the same byte N times, each filtering on different field values and 
dropping others on the floor -- a bit wasteful :) An alternative approach would 
be, you still have a single consumer that reads everything of DomainEvent, and 
"defer" the filtering in a later stage, e.g. after getting the data put them 
into different buffers inside the client based on their types and then 
depending on which types your client is currently interested in, just polling 
from the corresponding buffers while dropping the other buffers. By doing so 
you only need to send the bytes 1 time.

> support multiple consumerGroupCoordinators in TransactionManager
> ----------------------------------------------------------------
>
>                 Key: KAFKA-9953
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9953
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients
>    Affects Versions: 2.5.0
>            Reporter: Joost van de Wijgerd
>            Priority: Major
>         Attachments: KAFKA-9953.patch
>
>
> We are using kafka with a transactional producer and have the following use 
> case:
> 3 KafkaConsumers (each with their own ConsumerGroup) polled by the same 
> thread and 1 transactional kafka producer. When we add the offsets to the 
> transaction we run into the following problem: 
> TransactionManager only keeps track of 1 consumerGroupCoordinator, however it 
> can be that some consumerGroupCoordinators are on another node, now we 
> constantly see the TransactionManager switching between nodes, this has 
> overhead of 1 failing _TxnOffsetCommitRequest_ and 1 unnecessary 
> _FindCoordinatorRequest_.
> Also with  _retry.backoff.ms_ set to 100 by default this is causing a pause 
> of 100ms for every other transaction (depending on what KafkaConsumer 
> triggered the transaction of course)
> If the TransactionManager could keep track of coordinator nodes per 
> consumerGroupId this problem would be solved. 
> I have already a patch for this but still need to test it. Will add it to the 
> ticket when that is done



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to