Since we've seen quite a lot of questions recently about EOS on the mailing
list. I think it worth adding an FAQ entry here:

https://cwiki.apache.org/confluence/display/KAFKA/FAQ

So that we can refer future questions to the page than answering them
repeatedly. @Matthias J Sax <matth...@confluent.io> : would you like to do
it?


Guozhang

On Tue, Feb 19, 2019 at 3:12 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> Even if the question was sent 4 times to the mailing list, I am only
> answering is exactly-once (sorry for the bad joke -- could not resist...)
>
>
> You have to distinguish between "idempotent producer" and "transactional
> producer".
>
> If you enable idempotent writes (config `enable.idempotence`), your
> producer will get a cluster wide unique PID assigned. This PID, together
> with the sequence number, is used broker side to de-duplicate messages
> on write (in case the producer retries). Different producers can use the
> same sequence numbers, so PID are used to distinguish different
> producers and get unique PID-seqNum pairs.
>
> Idempotent writes, apply to single messages in isolation only. Consumer
> side, there is no change because no transactions are used
> (`isolation.level` config has no impact).
>
>
> If you want to write multiple message in an atomic manner (ie, write all
> 5 messages or none of them), you would need to use transactions. For
> this case you also assign a `transactional.id` producer side and should
> configure consumers with `read_committed` mode. The `transactional.id`
> is required, to abort in-flight transactions, in case a producer has an
> open transaction, crashes, and is restarted. (A PID is not sufficient,
> because it's lost on a crash). When there is an open transaction, and a
> producer crashes and is restarted, the broker will detect the open
> transaction (ie, same `transactional.id`) and abort it automatically.
>
> For compacted topics or multi-segment transactions are no special case.
> They work like regular transactions.
>
>
> -Matthias
>
>
> On 2/19/19 5:14 AM, Greenhorn Techie wrote:
> > Hi,
> >
> > Our data getting into Kafka is transactional in nature and hence I am
> > trying to understand EOS better. My present understanding is as below:
> >
> > It is mentioned that when producer starts, it will have a new PID, but
> only
> > valid till the session. Does that mean, is it a pre-requisite to have the
> > same / single producer session for exactly-once guarantees? I presume it
> is
> > not required. As per my understanding, this is where transactionl.id
> comes
> > into picture which is user defined and hence can survive producer
> restarts.
> >
> > I have few questions regarding the same:
> >
> > 1. If the above statement is correct, why do we need PID in the first
> place
> > and instead use transactionl.id all over?
> > 2. I understand that sequence number is something that is generated by
> > producer and increases monotonically. Does that mean, the sequence number
> > changes across producer restarts along with a new PID?
> > 3. Is PID meant mainly for idempotence where as transactional.id is for
> > transactional support?
> > 4. On the consumer side, only one config parameter is defined i.e.
> > isolation.level. For EOS, I presume this needs to be set to
> > ‘read_committed’ only. For EOS, it should never be set to
> ‘read_uncommitted’
> > 5. What is the impact of setting ‘enable.idempotence’ to true without
> > setting ‘transactional.id’ on the producer side? Does it have any
> > (side)effect?
> > 6. How does EOS work for compacted topics? Will the EOS behaviour be any
> > different for compacted topics?
> > 7. How does EOS work when transactions are written to two different log
> > segments?
> >
> > Can anyone please help me understand the nuances around EOS guarantees?
> >
> > Thanks
> >
>
>

-- 
-- Guozhang

Reply via email to