Hello Guozhang, thanks for the response, I have some doubts about the "N-1 producer-consumer" case you mentioned and why I may need to configure the transactional id there and how. Is this a case of N consumers sharing the same producer right?
My current implementation is creating a consumer per topic (I don't subscribe to multiple topics from the same consumer) and starting a producer per consumer, so the relation is 1 consumer/topic => 1 producer and the transactional id is set as <consumer-group>-<topic>-<random-uuid>. Do you see any problem with this configuration? Thanks again. El sáb, 21 may 2022 a las 16:37, Guozhang Wang (<wangg...@gmail.com>) escribió: > Hello Gabriel, > > What you're asking is a very fair question :) In fact, for Streams where > the partition-assignment to producer-consumer pairs are purely flexible, we > think the new EOS would not have hard requirement on transactional.id: > https://issues.apache.org/jira/browse/KAFKA-9453 > > I you implemented the transactional messaging via a DIY producer+consumer > though, it depends on how you'd expect the life-time of a producer, e.g. if > you do not have a 1-1 producer-consumer mapping then transactional.id is > not crucial, but if your have a N-1 producer-consumer mapping then you may > still need to configure that id. > > > Guozhang > > > > On Fri, May 20, 2022 at 8:39 AM Gabriel Giussi <gabrielgiu...@gmail.com> > wrote: > > > Before KIP-447 I understood the use of transactional.id to prevent us > from > > zombies introducing duplicates, as explained in this talk > > https://youtu.be/j0l_zUhQaTc?t=822. > > So in order to get zombie fencing working correctly we should assign > > producers with a transactional.id that included the partition id, > > something > > like <application><topic>-<partition-id>, as shown in this slide > > https://youtu.be/j0l_zUhQaTc?t=1047 where processor 2 should use the > same > > txnl.id A as the process 1 that crashed. > > This prevented us from having process 2 consuming the message again and > > committing, while process 1 could come back to life and also commit the > > pending transaction, hence having duplicates message being produced. In > > this case process 1 will be fenced by having an outdated epoch. > > > > With KIP-447 we no longer have that potential scenario of two pending > > transactions trying to produce and mark a message as committed, because > we > > won't let process 2 even start the transaction if there is a pending one > > (basically by not returning any messages since we reject the Offset Fetch > > if a there is a pending transaction for that offset partition). This is > > explained in this post > > > > > https://www.confluent.io/blog/simplified-robust-exactly-one-semantics-in-kafka-2-5/#client-api-simplification > > > > Having that, I don't see anymore the value of transactional.id or how I > > should configure it in my producers. The main benefit of KIP-447 is that > we > > no longer have to start one producer per input partition, a quote from > the > > post > > "The only way the static assignment requirement could be met is if each > > input partition uses a separate producer instance, which is in fact what > > Kafka Streams previously relied on. However, this made running EOS > > applications much more costly in terms of the client resources and load > on > > the brokers. A large number of client connections could heavily impact > the > > stability of brokers and become a waste of resources as well." > > > > I guess now I can reuse my producer between different input partitions, > so > > what transactional.id should I assign to it and why should I care, isn't > > zombie fencing resolved by rejecting offset fetch already? > > > > Thanks. > > > > > -- > -- Guozhang >