Re: What role plays transactional.id after KIP-447?

Gabriel Giussi Tue, 24 May 2022 05:50:00 -0700

Hello Guozhang,

thanks for the response, I have some doubts about the "N-1
producer-consumer" case you mentioned and why I may need to configure the
transactional id there and how. Is this a case of N consumers sharing the
same producer right?


My current implementation is creating a consumer per topic (I don't
subscribe to multiple topics from the same consumer) and starting a
producer per consumer, so the relation is 1 consumer/topic => 1 producer
and the transactional id is set as  <consumer-group>-<topic>-<random-uuid>.
Do you see any problem with this configuration?

Thanks again.

El sáb, 21 may 2022 a las 16:37, Guozhang Wang (<wangg...@gmail.com>)
escribió:

> Hello Gabriel,
>
> What you're asking is a very fair question :) In fact, for Streams where
> the partition-assignment to producer-consumer pairs are purely flexible, we
> think the new EOS would not have hard requirement on transactional.id:
> https://issues.apache.org/jira/browse/KAFKA-9453
>
> I you implemented the transactional messaging via a DIY producer+consumer
> though, it depends on how you'd expect the life-time of a producer, e.g. if
> you do not have a 1-1 producer-consumer mapping then transactional.id is
> not crucial, but if your have a N-1 producer-consumer mapping then you may
> still need to configure that id.
>
>
> Guozhang
>
>
>
> On Fri, May 20, 2022 at 8:39 AM Gabriel Giussi <gabrielgiu...@gmail.com>
> wrote:
>
> > Before KIP-447 I understood the use of transactional.id to prevent us
> from
> > zombies introducing duplicates, as explained in this talk
> > https://youtu.be/j0l_zUhQaTc?t=822.
> > So in order to get zombie fencing working correctly we should assign
> > producers with a transactional.id that included the partition id,
> > something
> > like <application><topic>-<partition-id>, as shown in this slide
> > https://youtu.be/j0l_zUhQaTc?t=1047 where processor 2 should use the
> same
> > txnl.id A as the process 1 that crashed.
> > This prevented us from having process 2 consuming the message again and
> > committing, while process 1 could come back to life and also commit the
> > pending transaction, hence having duplicates message being produced. In
> > this case process 1 will be fenced by having an outdated epoch.
> >
> > With KIP-447 we no longer have that potential scenario of two pending
> > transactions trying to produce and mark a message as committed, because
> we
> > won't let process 2 even start the transaction if there is a pending one
> > (basically by not returning any messages since we reject the Offset Fetch
> > if a there is a pending transaction for that offset partition). This is
> > explained in this post
> >
> >
> https://www.confluent.io/blog/simplified-robust-exactly-one-semantics-in-kafka-2-5/#client-api-simplification
> >
> > Having that, I don't see anymore the value of transactional.id or how I
> > should configure it in my producers. The main benefit of KIP-447 is that
> we
> > no longer have to start one producer per input partition, a quote from
> the
> > post
> > "The only way the static assignment requirement could be met is if each
> > input partition uses a separate producer instance, which is in fact what
> > Kafka Streams previously relied on. However, this made running EOS
> > applications much more costly in terms of the client resources and load
> on
> > the brokers. A large number of client connections could heavily impact
> the
> > stability of brokers and become a waste of resources as well."
> >
> > I guess now I can reuse my producer between different input partitions,
> so
> > what transactional.id should I assign to it and why should I care, isn't
> > zombie fencing resolved by rejecting offset fetch already?
> >
> > Thanks.
> >
>
>
> --
> -- Guozhang
>

Re: What role plays transactional.id after KIP-447?

Reply via email to