Even if the question was sent 4 times to the mailing list, I am only answering is exactly-once (sorry for the bad joke -- could not resist...)
You have to distinguish between "idempotent producer" and "transactional producer". If you enable idempotent writes (config `enable.idempotence`), your producer will get a cluster wide unique PID assigned. This PID, together with the sequence number, is used broker side to de-duplicate messages on write (in case the producer retries). Different producers can use the same sequence numbers, so PID are used to distinguish different producers and get unique PID-seqNum pairs. Idempotent writes, apply to single messages in isolation only. Consumer side, there is no change because no transactions are used (`isolation.level` config has no impact). If you want to write multiple message in an atomic manner (ie, write all 5 messages or none of them), you would need to use transactions. For this case you also assign a `transactional.id` producer side and should configure consumers with `read_committed` mode. The `transactional.id` is required, to abort in-flight transactions, in case a producer has an open transaction, crashes, and is restarted. (A PID is not sufficient, because it's lost on a crash). When there is an open transaction, and a producer crashes and is restarted, the broker will detect the open transaction (ie, same `transactional.id`) and abort it automatically. For compacted topics or multi-segment transactions are no special case. They work like regular transactions. -Matthias On 2/19/19 5:14 AM, Greenhorn Techie wrote: > Hi, > > Our data getting into Kafka is transactional in nature and hence I am > trying to understand EOS better. My present understanding is as below: > > It is mentioned that when producer starts, it will have a new PID, but only > valid till the session. Does that mean, is it a pre-requisite to have the > same / single producer session for exactly-once guarantees? I presume it is > not required. As per my understanding, this is where transactionl.id comes > into picture which is user defined and hence can survive producer restarts. > > I have few questions regarding the same: > > 1. If the above statement is correct, why do we need PID in the first place > and instead use transactionl.id all over? > 2. I understand that sequence number is something that is generated by > producer and increases monotonically. Does that mean, the sequence number > changes across producer restarts along with a new PID? > 3. Is PID meant mainly for idempotence where as transactional.id is for > transactional support? > 4. On the consumer side, only one config parameter is defined i.e. > isolation.level. For EOS, I presume this needs to be set to > ‘read_committed’ only. For EOS, it should never be set to ‘read_uncommitted’ > 5. What is the impact of setting ‘enable.idempotence’ to true without > setting ‘transactional.id’ on the producer side? Does it have any > (side)effect? > 6. How does EOS work for compacted topics? Will the EOS behaviour be any > different for compacted topics? > 7. How does EOS work when transactions are written to two different log > segments? > > Can anyone please help me understand the nuances around EOS guarantees? > > Thanks >
signature.asc
Description: OpenPGP digital signature